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1.  RESEARCH  OVERVIEW 


This  report  describes  the  results  of  our  research  under  contract 
F-33615-80-C-1080 .  This  report  also  contains  results  of  research  only 
partially  supported  by  this  contract.  Our  research  includes  work  at 
many  levels  of  an  Image  Understanding  system,  including  low  level 
feature  extraction,  symbolic  descriptions  and  image  matching.  We  have 
also  been  working  with  Hughes  Research  Laboratories  on  hardware 
implementation  of  IU  algorithms,  using  VLSI  technology.  The  research 
results  are  summarized  below. 

Image  Matching 

Matching  of  an  image  to  a  symbolic  map,  or  a  symbolic  description 
of  another  image,  is  central  for  the  tasks  of  map  updating  and  change 
detection.  In  the  past  we  have  described  results  using  aerial  images 
and  a  relaxation  matching  algorithm  developed  by  Faugeras  and  Price. 
Some  new  results  using  this  algorithm  are  described  in  section  2.1. 

Texture  Analysis  and  Synthesis 

Presence  of  texture  is  a  dominant  cause  of  difficulty  in  the 
analysis  of  aerial  images.  We  have  continued  development  of  several 
texture  analysis  and  synthesis  techniques. 

In  previous  work,  we  developed  techniques  of  structural  texture 
analysis  by  examining  repetition  patterns  of  micro  edges.  These 
techniques  obtained  isolated  2-dimensional  texture  primitives  and  the 
period  of  repetition,  if  any.  For  complex  textures,  the  texture 
primitives  of  different  kinds  are  also  related  in  certain  ways,  and  in 
some  cases,  a  "super-primitive"  is  repeated  regularly.  Section  2.2 
describes  techniques  for  computing  relations  between  texture 
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primitives. 


A  technique  of  texture  analysis  by  Singular  Value  Decomposition 
(SVD)  is  described  in  section  2.3.  This  section  describes  new  results 
of  research  supported  by  earlier  contracts,  and  this  work  is  not  being 
actively  pursued  now. 

Results  of  texture  synthesis  using  stochastic  models  are  given  in 
sections  2.4  and  2.5.  Synthesized  textures  have  very  similar 
appearance  to  the  corresponding  natural  textures. 


Descr iption 


Matchinc 


Shape  of  structural  objects  can  be  described  by  relationships  of 
their  components.  A  technique  of  connecting  fragments  of  runway 
descriptions  into  complete  runways  is  given  in  section  2.6. 


Sections  2.7  and  2.8  describe  use  of  hierarchical  gradient 
relaxation  techniques  for  matching  of  shapes  including  scenes  with 
occlusion . 


Other  IU  Projects 


Sections  2.9  and  2.10  present  new  results  of  previously  reported 
projects  on  motion  detection  and  range  data  processing. 


Hardware  Implementation 


In  continuing  work  with  Hughes  Research  Laboratories,  Malibu, 
California,  we  are  investigating  the  use  of  VLSI  technology  for 
hardware  implementation  of  IU  algorithms.  We  have  chosen  to 
investigate  the  following  algorithms  initially: 
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i)  Nevatia-Babu  Line  Finder 

ii)  Ohlander  Region  Segmentor 

iii)  Laws  Texture  Analysis  System 


The  choice  of  the  above  three  algorithms  was  based  on  tbeii 
computation  intensive  nature,  their  use  for  a  broad  range  of  problems 
and  experience  with  a  large  number  of  images  for  the  first  two.  Also 
these  algorithms  are  largely  local  and  hence  easier  to  implement  in 
VLSI  hardware,  where  reducing  interconnections  is  important.  Further, 
the  three  algorithms  have  common  kernels,  such  as  convolution,  but 
also  require  different  subsequent  processing. 

Detailed  logic  designs  resulting  in  estimates  of  gates  and  the 
number  of  chips  required  to  implement  the  line  finder  and  texture 
system  have  been  completed  and  reported  in  Section  3.  Also,  we  have 
identified  a  common  kernel  for  these  algorithms  and  designed  a  modular 
programmable  digital  processing  element  RADIUS.  It  is  based  on  novel 
concepts  of  residue  arithmetic  and  car  perform  a  variety  of  local  area 
processing  operations  including  convolution  with  very  high  circuit 
function  density. 
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IMAGE  UNDERSTANDING  PROJECTS 


2.1  More  Results  on  Application  of  Relaxation  Matching 


K.E.  Price 


The  relaxation  algorithm  was  described  in  detail  in  the  previous 
semi-annual  report  so  it  will  not  be  discussed  here.  In  this  one  we 
will  only  present  some  additional  results.  In  one  of  the  images  of 
the  previous  report  there  were  3  clusters  of  tanks  with  14,  11  and  6 
objects  in  each  of  the  groups.  Fig.  1  gives  the  results  for  the 
matching  of  these  3  groups.  The  segmentation  of  what  should  have  been 
T10  and  T14  in  the  upper  group  was  imperfect.  T14  was  matched  to  a 
fragment  but  T10  was  not.  Fig.  2  illustrates  how  the  most  likely 
match  changes  through  the  micro  and  macro  iterations  described 
previously.  Initially,  the  tanks  are  assigned  based  only  on  feature 
values,  so  that  all  are  assigned  to  the  image  segment  closest  to  the 
model  tank  size.  After  a  few  micro  iterations,  the  relations  become 
more  important  and  some  assignments  become  stronger.  By  the  second 
macro  iteration,  the  relations  contribute  enough  to  the  initial 
assignments  that  they  are  mostly  correct,  which  is  not  the  case  of 
potential  assignments  at  the  end  of  the  first  macro  iteration. 


4 


Fig.  1.  Results  for  matching  on  3  clusters  of  tanks. 
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Initial  assignments 


Micro  iteration  1 


Fig.  2  Sequence  showing  the  changing  most  likely  assignments 
for  all  objects  in  the  top  cluster  of  tanks  (Fig.  7). 

The  initial  assignments  are  given  with  the  results  for 
each  micro  iteration.  At  the  end  of  the  macro  iteration 
several  units  are  labeled  (indicated  by  *) ,  and  these 
labels  are  carried  to  the  following  iterations  (indicated 
by  **) .  In  the  later  macro  iterations  the  assignments 
do  not  change  so  that  the  results  are  indicated  with  the 
initial  likelihood  values. 


Micro  iteration  4  Macro  iteration  2 

End  of  macro  iteration  1  Initial  assignments 

and  micro  iteration  5 
-  in  parenthesis. 


Fig.  2  continued. 
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Macro  iteration  3 
Initial  assignments 
and  micro  iteration  6 
6  -  in  parenthesis. 


Macro  iteration  4 
Initial  assignments 
and  micro  iteration  7 
7  -  in  parenthesis. 


Fig.  2  continued 


2.2  Determining  Spatial  Relationships  Between  Texture  Primitives 
in  Homogeneous  Regular  Textures 

F.  Vilnrotter*,  R.  Nevatia,  and  K.E.  Price 


Introduction 


In  previous  reports,  we  have  described  programs  used  to  generate 
descriptions  of  natural  textures  [1-2]  and  extract  and  describe 
texture  primitives  [3].  (These  reports  also  contain  a  discussion  of 
other  work  and  a  list  of  related  references) .  Short  descriptions  of 
these  programs  may  also  be  found  in  [4-5]. 


In  the  first  part  of  the  program  edge  repetition  arrays  are 
produced  using  the  edge  and  direction  images  corresponding  to  the 
original  image  (an  edge  repetition  array  is  the  binary  case  of  a  gray 
level  co-occurrence  matrix) .  These  arrays  are  calculated  for  6 
directions  (0,  30,  60,  90,  120,  and  150  degrees),  for  both  dark  and 
light  intensity  objects,  at  distances  within  a  range  specified  by  the 
user.  A  comprehensive  discussion  of  edge  repetition  arrays  is  given 
in  [1]  . 


In  the  second  part  of  the  program  the  edge  repetition  arrays  are 
analyzed  to  determine  whether  there  are  predominant  element  sizes  in 
any  of  the  6  scan  directions  and  if  so  whether  these  elements  occur  at 
regular  intervals  within  the  image.  The  details  of  this  analysis  are 
presented  in  [2]  and  will  not  be  repeated  here. 


In  the  third  part  of  the  program  groups  of  texture  primitives  are 
extracted  and  described,  using  the  texture  descriptions  generated  in 
part  2.  Primitives  are  represented  as  masks.  These  masks  are  then 
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individually  analyzed  to  determine  the  characteristics  (average  size, 
intensity,  etc.)  of  each  primitive  group.  The  details  of  the 
primitive  extraction  and  description  processes  are  given  in  [3], 

To  summarize:  analysis  of  edge  repetition  arrays  provided  one 
dimensional  texture  descriptions.  These  descriptions  are  the  starting 
point  for  a  two  dimensional  texture  primitive  search.  The  products  of 
this  search  are  a  set  of  texture  primitive  masks  and  descr iptions. 

Up  to  this  point  the  only  descriptive  information  we  have 
relating  to  the  arrangement  of  the  texture  primitives  is  the  element 
spacing  information  produced  in  part  2.  However,  this  information 
pertains  to  only  a  single  scan  direction.  Therefore,  we  can  tell  only 
if  a  certain  group  of  primitives  in  periodic  in  one  direction  and  if 
so,  what  period  is  exhibited.  When  no  element  spacing  can  be  found 
for  any  of  the  elements  within  a  texture,  the  texture  pattern  is 
assumed  to  be  random. 

In  the  event  that  we  are  considering  a  non-random  texture  pattern 
the  spacing  information  described  above  is  not  sufficient  to 
characterize  the  particular  placement  rules  within  the  texture. 

Consider  the  2  brick  patterns  in  Figure  1  (a  and  b) .  These 
patterns  are  similar  in  that  each  contains  light  and  dark  textural 
elements  which  are  rectangular  and  are  arranged  in  a  regular  pattern. 
The  arrangement  of  the  bricks  within  the  patterns  is  different,  but  no 
evidence  of  this  difference  is  given  in  their  primitive  descriptions 
(see  Figure  2  a  and  b) .  Both  sets  of  bricks  were  detected  by  scanning 
vertically.  Both  exhibit  an  element  spacing  in  this  direction. 
However,  a  definitive  description  of  element  arrangement  is  lacking. 
This  placement  information  is  contained  in  the  primitive  masks  and 
composite  images,  (see  figs.  3  and  4),  and  is  the  object  of  analysis  in 
the  next  program  section. 
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(a)  Brick  Pattern  1  Subwindow. 


(b)  Brick  Pattern  2  Subwindow. 


Figure  1 . 


12 


r 


PRIMITIVE  ANALYSIS  FOR  BRICK  1 

RELATIVE  INTENSITY  IS  LIGHT  DIRECTION  IS  VERTICAL 
NUMBER  OF  SAMPLES:  &7  PRIMITIVE  NUMBER  IS:  1 
AVERAGE  PRIMITIVE  DIMENSIONS  ARE:  (3-00  AND  52- RE) 

AVERAGE  PRIMITIVE  SIZE  IN  PIXELS  IS:  (10S.3R) 

AVERAGE  PRIMITIVE  INTENSITY  IS:  (ISO.aO) 

PRIMITIVES  REPEAT  AT  ELEMENT  SPACING:  (15-00)  IN  ABOVE  MENTIONED  DIRECTION 

RELATIVE  INTENSITY  IS  DARK  DIRECTION  IS  VERTICAL 
NUMBER  OF  SAMPLES:  1QL  PRIMITIVE  NUMBER  IS:  S 
AVERAGE  PRIMITIVE  DIMENSIONS  ARE:  (IQ.  DO  AND  35-34) 

AVERAGE  PRIMITIVE  SIZE  IN  PIXELS  IS:  (353-14) 

AVERAGE  PRIMITIVE  INTENSITY  IS:  (100- 5R) 

PRIMITIVES  REPEAT  AT  ELEMENT  SPACING:  (15-00)  IN  ABOVE  MENTIONED  DIRECTION 


Figure  2a.  Brick  pattern  1  primitive  texture 
element  description. 
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PRIMITIVE  ANALYSIS  FOR  BRICK  E 

RELATIVE  INTENSITY  IS  LIGHT  DIRECTION  IS  VERTICAL 
NUMBER  OF  SAMPLES:  31  PRIMITIVE  NUMBER  IS:  1 

AVERAGE  PRIMITIVE  DIMENSIONS  ARE:  (5-00  AND  145. 5b) 

AVERAGE  PRIMITIVE  SIZE  IN  PIXELS  IS:  (Bfll.fl?) 

AVERAGE  PRIMITIVE  INTENSITY  IS:  (175-44) 

PRIMITIVES  REPEAT  AT  ELEMENT  SPACING:  (15.00)  IN  ABOVE  MENTIONED  DIRECTION 

RELATIVE  INTENSITY  IS  DARK  DIRECTION  IS  VERTICAL 
NUMBER  OF  SAMPLES:  122  PRIMITIVE  NUMBER  IS:  5 
AVERAGE  PRIMITIVE  DIMENSIONS  ARE:  (fl.00  AND  40-13) 

AVERAGE  PRIMITIVE  SIZE  IN  PIXELS  IS:  (35b- 57) 

AVERAGE  PRIMITIVE  INTENSITY  IS:  (134.55) 

PRIMITIVES  REPEAT  AT  ELEMENT  SPACING:  (15.00)  IN  ABOVE  MENTIONED  DIRECTION 


Figure  2b.  Brick  pattern  2  primitive  texture 
element  description. 


Brick  pattern  1 
composite  primitives. 


(b)  Light  brick  pattern  1 

primitive  found  in  vertical 
scan. 


Figure  3. 
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Methodology 

Individual  primitive  masks  provide  the  locations  of  all  the 
primitives  of  a  particular  type  within  an  image.  Determining  the 
predominant  placement  rules  exhibited  by  these  elements  can  be  divided 
into  a  number  of  independent  subtasks: 

1)  It  must  be  determined  whether  2  primitive  masks  represent  the 
same  textural  elements.  This  situation  arises  when  a  textural  element 
is  detected  in  more  than  one  scan  direction. 

Implementation  -  the  2  masks  under  consideration  are  ANDED  to 
create  a  new  mask.  If  the  resulting  mask  contains  more  than  a  given 
percentage  of  the  non-zero  points  of  either  of  the  original  masks  the 
2  masks  are  combined  and  their  descriptions  are  linked.  All  pairs  of 
primitive  masks  are  tested  and  the  result  is  a  new  set  of  masks  and  a 
reduced  primitive  list. 

2)  Inter-primitive  distances  will  be  measured  along  the  lines 
connecting  primitive  centroids.  Therefore,  all  primitive  centroids 
must  be  known. 

Implementation  -  each  primitive  in  each  primitive  mask  is 
catalogued  along  with  coordinates  of  its  center  of  mass  and  its 
primitive  type.  An  image  containing  the  index  into  this  list  in  the 
locations  corresponding  to  the  non-zero  entries  of  the  mask  for  that 
particular  textural  primitive  is  created  for  future  reference  during 
the  matching  process. 

3)  From  the  primitive  centroids  the  image  will  be  scanned  in  12 
directions  in  the  search  for  predominant  spatial  relationships. 

Implementation  -  the  scan  originating  at  the  center  of  mass  of  a 
primitive  proceeds  in  each  of  12  directions.  When  a  different 
primitive  is  encountered  its  type,  say  A,  and  centroid  location  are 
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noted.  The  angle  and  distance  between  the  2  centroids  is  also  noted 
and  a  1  is  added  to  the  bin  indexed  by  that  primitive  type  (A),  the 
angle  (within  10  degrees),  and  the  distance  (within  3  pixels).  All  of 
the  results  of  the  scans  originating  from  the  same  primitive  type  are 
considered  together.  They  are  stored  in  a  temporary  matrix  for 
further  processing.  Only  the  spatial  relationships  which  occur  most 
frequently  are  part  of  the  final  results. 

4)  The  raw  counts  of  primitive  matches  must  be  normalized  and 
thresholded  so  that  only  the  matches  occurring  most  often  will  be 
considered . 


Implementation  -  All  matches  generated  by  starting  from  the 
centroids  of  a  particular  type  of  primitive,  say  A,  are  stored  in  a 
temporary  matrix.  The  following  steps  are  then  taken: 


(i)  Each  entry  is  normalized  as  follows.  Let  M=#  Matches  from 
primitive  type  A  to  primitive  type  B  at  angle  0  and  distance  D.  Then 


Normalized  (M)  = 


100  x  M 


#  (Prims,  of  Type  A> 


(#Rows-D  Sin  0)  (#Cols-D  Cos  0 ) 
"(iRows  x  #Cols) 


(ii)  All  matrix  entries  greater  than  or  equal  to  a  certain 
threshold,  (THRl*Maximum  Matrix  Value),  are  replaced  by  the  sum  of  9 
entries.  These  9  entries  are  those  whose  degree  and  distance  indices 
are  not  different  from  the  original  entry's  indices  by  more  than  1. 
This  should  insure  that  any  large  volume  peak  which  is  spread  out  will 
not  be  passed  over  in  the  next  thresholding  process. 


(iii)  A  new  matrix  maximum,  NEWMAX,  is  calculated.  Any  entry 
greater  than  or  equal  to  a  certain  threshold,  (THR2 *NEWMAX )  which  is 
also  a  local  maximum  is  added  to  the  set  of  final  results. 


These  final  results  are  a  set  of  spatial  relationships  of  the 

form: 
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PRIM1  =  A 


PRIM2  =  B  Angle  =  9  DISTANCE  =  D  PERCENTAGE  =  P 


This  relationship  states  that  the  centroid  of  a  primitive  of  type  A  is 
separated  from  the  centroid  of  a  primitive  of  type  B  by  a  distance  of 
D  pixels  at  angle  0  P  percent  of  the  time  that  a  primitive  of  type  A 
is  encountered. 

Results 


The  spatial  relationships  generated  for  the  2  brick  texture 
patterns  are  tabulated  in  figures  5a  and  6a.  Please  note  that  angles 
are  found  to  the  nearest  10  degrees,  distances  to  the  nearest  3 
pixels.  The  salient  spatial  characteristics  of  Brick  Pattern  1  (BPl) 
are  captured  in  the  spatial  relationships  generated  for  this  pattern 
(see  figure  5a).  Now  consider  Brick  Pattern  2  ( BP2 ) .  The  primitive 
placement  results  or  spatial  relationships  given  in  figure  6a  are  not 
as  strong  as  the  results  for  the  first  brick  pattern.  All  of  the 
spatial  relationship  percentages  for  BPl  are  above  40%,  while  there 
are  none  above  40%  for  BP2 .  Also,  there  are  many  more  relationships 
listed  for  the  second  pattern. 

A  comparison  of  the  brick  primitives  of  BPl  (figure  3c)  with  the 
brick  primitives  of  BP2  (figure  4c)  will  help  to  provide  some  of  the 
reasons  for  this  difference,  many  of  the  brick  primitives  in  figure 
4c  are  connected  to  neighboring  bricks  or  mortar  due  to  missing  edges. 
When  2  bricks  are  connected  to  form  one  primitive  its  centroid  will 
appear  directly  in  line  with  the  centroids  of  the  bricks  above  and 
below  it.  If  this  happens  enough  times  the  brick  primitive  will 
appear  to  be  related  to  itself  at  angles  of  50  and  -90  degrees.  This 
relationship  does  appear  in  figure  6a  with  a  relatively  high 
percentage  or  relational  occurrence  rate.  Along  with  this  effect 
there  is  the  lowering  of  the  percentages  of  the  actual  (or  correct) 
primitive  relationships  making  thr eshold i ng  less  effective. 
Intermittently  the  vertical  strips  of  mortar  separating  neighboring 
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Brickl  Primitive  Placement  Results 


ANGLE1 

DISTANCE 

PRIM1 

PRIM2 

(DEGREES) 

(PIXELS) 

1 

1 

-90 

15 

1 

1 

90 

15 

1 

2 

-90 

6 

1 

2 

90 

6 

2 

1 

-90 

6 

2 

1 

90 

6 

2 

2 

-180 

45 

2 

2 

-90 

15 

2 

2 

0 

45 

2 

2 

90 

15 

*These  are  used  in  part  b. 

^"Negative  angles  are  counter-clockwise;  positive  angles  are 
clockwise. 


Figure  5a.  Brick  pattern  1  interprimitive  angle  in  degrees, 
distance  in  pixels  and  frequency  of  occurrence. 


TOTAL 

(%) 

52.90 

52.90 

61.36* 

52.25* 


42.38 
50 . 53 

64.09* 

94.55* 

63.12* 

94.55* 
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Figure  5b.  Brick  pattern  1  texture  reconstruction 
using  information  in  a. 


PRIM1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 


2 

2 

2 

2 

2 

2 

2 

2 

2 
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Brick2 


PRIM2 
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1 

1 

1 

1 

1 

1 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 


2 

2 

2 

2 

2 

2 

2 

2 

2 
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Primitive  Placement  Results 


ANGLE1 

DISTANCE 

TOTAL 

(DEGREES) 

(PIXELS) 

(%) 

-90 

12 

18.28* 

-50 

15 

13.11 

-40 

21 

13.20 

0 

132 

15.48* 

90 

12 

18 . 27* 

130 

15 

13.14 

140 

21 

13.26 

-170 

18 

18.48 

-170 

24 

13.34 

-170 

30 

16.17 

-30 

12 

23.48* 

0 

48 

13 . 68 

0 

72 

14.14 

10 

24 

18.52 

10 

33 

18.76 

80 

6 

20.70 

100 

6 

18.11 

-160 

24 

30 . 56* 

-90 

12 

22.24 

-30 

27 

27.03 

0 

24 

19.93 

0 

45 

33.00* 

20 

24 

30.38* 

90 

12 

23 . 10 

150 

27 

26.32 

180 

24 

20 . 08 

180 

45 

32.57* 

♦These  are  used  in  part  b, 

^"Negative  angles  are  counter-clockwise;  positive  angles  are 
clockwise . 


Figure  6a.  Brick  pattern  2  interpr imitive  angle  in  degrees, 
distance  in  pixels  and  frequency  of  occurrence. 
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Figure  6b.  Brick  pattern  2  texture  reconstruction 
using  information  in  a. 
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bricks  appear  as  separate  primitives.  This  is  responsible  for  the 
occurrence  of  the  horizontal  brick  relationships  for  BP2  at  distance 
24  with  percentage  19.93  for  0  degrees  and  20.08  at  180  degrees. 

Type  1  primitives,  the  horizontal  mortar  strips,  are  also  a 
problem.  (Compare  figure  3b  for  BP1  with  figure  4b  for  BP2).  In  BP2 
the  mortar  strips  are  connected  to  form  primitives  which  are  too  long 
to  generate  meaningful  spatial  relationships  and  too  short  to  be 
considered  as  part  of  the  background. 

In  spite  of  these  problems  a  systematic  search  of  the  information 
in  figure  6a  can  lead  to  a  reasonable  structural  description  of  the 
texture  pattern.  First  try  to  establish  the  repetition  pattern  of 
each  set  of  primitives  along  2  non-colinear  lines.  Hopefully  this  can 
be  done  for  at  least  1  primitive  set.  Starting  with  the  strongest 
grid  pattern  established  try  to  relate  all  other  primitives  to  the 
primitives  composing  this  grid. 

In  BP1  the  only  primitive  with  spatial  relations  in  2 
non-colinear  directions  is  primitive  2,  the  brick  primitive.  It  forms 
a  rectangular  grid  pattern  since  its  spatial  relations  occur  at  angles 
-130,  -90,  0,  and  90  degrees.  The  mortar  primitive,  primitive  1,  is 
related  to  the  brick  primitives  by  vertical  spatial  relationships  at 
distance  6.  Using  the  primitive  analysis  information  in  figure  2a  the 
textural  pattern  can  be  filled  in  to  produce  figure  5b. 

In  BP2  the  primitive  exhibiting  the  strongest  spatial 
relationships  is  the  brick  primitive,  primitive  2.  Its  strongest 
spatial  relationships  occur  at  0  and  180  degrees  at  a  distance  of  45 
pixels  and  at  -160  and  20  degrees  at  a  distance  of  24  pixels.  Ttie 
mortar  primitives  form  a  rectangular  grid  pattern  using  the 
relationships  at  90,  -90,  and  0  degrees.  Note  that  the  distance  of 
132  pixel  at  0  degrees  is  reasonable  because  the  image  is  only  256x256 
pixels.  Hence  2  complete  mortar  primitives  at  approximately  145 
pixels  across  could  not  both  fit  along  the  same  horizontal  line  within 


the  image. 


The  disparity  of  primitive  sizes,  40  pixels  long  for  primitive  2 
and  145  pixels  long  for  primitive  1,  will  lead  to  many  different 
angular  relationships  between  primitives  of  type  1  and  2.  It  is 
necessary  only  to  verify  that  they  are  related  in  some  way  so  that  the 
2  grids  can  be  interposed  at  the  proper  angle  and  distance.  The 
strongest  relationship  is  at  -30  degrees  and  a  distance  of  12  pixels. 
At  this  angle  and  distance  it  would  seem  that  the  2  primitive  types 
actually  form  alternate  rows.  Using  the  primitive  analysis 
information  in  figure  2b  the  textural  pattern  can  be  filled  in  to 
produce  figure  5b. 

The  systematic  interpretation  of  the  relational  results  of 
figures  5a  and  6a  has  not  yet  been  automated  and  the  patterns 
displayed  in  figures  5b  and  6b  have  been  generated  by  hand.  For  such 
a  scheme  to  be  successful  the  patterns  under  analysis  must  be 
homogeneous  regular  texture  patterns. 

Conclusion 


Work  on  this  program  is  still  in  progress.  It  is  hoped  that  the 
relational  information  produced  will  lend  itself  to  automatic 
interpretation.  This  would  lead  to  the  generation  of  more  definitive 
descriptions  of  homogeneous  regular  textures. 
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Introduction 

Singular  value  decomposition  is  a  technique  of  matrix 
transformation  which  has  been  used  as  a  linear  algebraic  method  for 
obtaining  least  square  solutions  for  a  set  of  homogeneous  equations 
[1].  SVD  has  also  been  used  for  calculating  the  pseudo-inverse  of  a 
matrix  [2],  In  the  field  of  image  processing  where  large  matrices  of 
data  such  as  pictures  are  dealt  with,  SVD  can  play  an  important  role 
in  extracting  the  most  useful  information  from  an  exorbitant  amount  of 
highly  correlated  data.  SVD,  therefore,  provides  a  tool  for  image 
compression  and  efficient  picture  transmission  [3]. 

*This  section  describes  previously  unreported  results  of  earlier  work 
supported  by  DARPA. 


In  pattern  recognition,  SVD  has  been  explored  for  automatic  image 
feature  extraction  and  texture  classification  [4-5].  In  this  report, 
a  powerful  technique  for  non-supervised  learning  and  class i f i ca t i on  by 
SVD  will  be  discussed. 

The  mathematical  definition  of  SVD  is 

F  =  U  S  VT  ( 

where,  the  matrix  F  is  decomposed  into  three  matrices  U,  S,  and  V. 
and  V  are  unitary  (for  complex  F)  ,  or  orthonormal  (for  real  F)  and 
is  a  real  diagonal  matrix  which  contains  the  positive  singular  values 
of  F  in  descending  order.  Singular  values  are  excellent  descriptors 
of  elemental  inter-relationships  in  a  matrix  or  picture  which  is 
non-structural .  An  example  of  a  structural  image  is  that  of  man-made 
objects;  and  an  example  of  a  non-structural  one  includes  textural 
images  such  as  pictorial  patterns  of  grass,  raffia,  sand,  or  wool. 

One  of  the  best  models  for  image  texture  is  a  stochastic  one.  In 
considering  a  stochastic  model  for  the  texture  field  F,  the  elements 
of  F  can  be  considered  as  random  variables  with  correlation  among  all 
of  them.  Using  equation  (1),  for  decompbsing  F,  the  elements  of  U,  S 
and  V  matrices  will  also  be  random  variables.  It  has  been 
mathematical ly  proved  and  experimentally  verified  that  variations  of 
elements  in  the  texture  field  F  is  substantially  galvanized  around  its 
singular  values  rather  than  affecting  elements  of  U  and  V  [5].  This 
latter  property  is  used  here  f o r  'non-supe rv i sed  learning. 

Two  sets  of  experiments,  one  with  simulated  texture  and  the  other 
with  natural  textures,  have  been  performed  which  will  be  elaborated 
on . 
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Method  of  Evaluation 


For  evaluating  the  performance  of  SVD  learning,  Bhattacha ryya 
distance  figure  of  merit  is  used.  The  procedure  is  as  follows: 

i)  The  textures'  histograms  are  gaussianized  to  have  zero  mean  and 
unit  variance.  By  taking  this  measure,  all  photographic  biases  are 
removed  and  the  textures  will  only  differ  in  their  internal  shape. 

ii)  Non-overlapping  32x32  sample  windows  are  extracted  from  each 
textural  image. 

iii)  For  every  sample  window,  a  vector  of  singular  values  is 
computed . 

iv)  Each  vector  of  singular  values  is  normalized. 

v)  From  each  normalized  vector  of  singular  values,  mean,  deviation, 
skewness  and  kurtosis  is  obtained  and  a  4-dimensional  feature  vector 
is  formed  [6 ] . 

vi)  For  each  texture  field,  there  are  N  feature  vectors  where  N  is 
the  number  of  sample  windows.  Using  these  feature  vectors,  the  sample 
mean  and  sample  covariance  are  computed. 

vii)  Using  the  sample  mean  and  covariance  of  each  pair  of  texture 
fields,  their  B-distance  is  computed. 


Optimum  Number  of  Samples 


An  experiment  to  determine  the  optimum  number  of  samples  has  been 
performed  with  32,  64,  128  and  196  samples.  It  has  been  found  that  64 
gives  the  optimum  number  of  samples. 


28 


Computer  Simulation  of  Texture 


The  objective  of  this  section  is  to  generate  512x512  texture 
fields  which  are  zero  mean,  normally  distributed,  having  a  separable 
first  order  Markovian  covariance  matrix  with  parameter  p,  where  p  can 
be  given  a  desired  value  between  0  and  1.  For  visualizing  purposes,  a 
linear  transformation  is  performed  on  each  generated  field  to  make  it 
a  8-bit  (integer)  picture  with  dynamic  range  of  0_  to  255 .  Figure  1 
shows  6  simulated  textures  with  normal  distribution  and  correlation  in 
row  and  column  directions  such  that 


K 


C 


(2) 


K  and  K  are  column  and  row  covariance  matrices  and  are  of  toeplitz 
L  R 

form  which  implies  fist  order  Markov  behavior  [7], 

Unsupervised  Learning  with  Simulated  (Artificial)  Texture 

After  generating  the  texture  fields  with  various  p's,  16  32x32 

sample  windows  of  texture  with  p=0.5  is  imbedded  onto  texture  fields 
with  p=0.6,  p=0.7,  and  p=0.9.  The  same  embedments  are  performed  in  a 
reverse  manner. 

NOTATIONAL  CONVENTION  1: 

"0.5"  of  figure  1(b)  represents  the  texture  associated  witii  P=0.5. 

"0. 5/0.9"  of  figure  2(e)  means  the  texture  associated  with  p=0.9  has 
been  imbedded  in  the  texture  associated  with  P=0.9. 


i 


mm 


Figure  1.  Computer  Simulation  of  Texture 


Figure  2  presents  the  imbedded  textures  for  which  B-distances 
between  various  combinator ial  pairs  have  been  tabulated  in  Tables  1, 
2,  and  3. 

REMARK  1:  From  each  of  the  texture  fields  in  Figure  2,  64  32x32  sample 
windows  are  randomly  selected  making  sure  that  the  16  imbedded  ones 
are  included!  This,  in  essence,  means  that  for  example,  "0.5/0. 9" 
provides  75%  "0.5"  +  25%  "0.9"  for  B-distance  computation. 

REMARK  2:  Each  32x32  texture  field  has  mean  zero.  Their  second  moment 
is  tr  (K  ®K_)  =tr  (K  )  xtr  (KD)  .  For  a  toeplitz  matrix  of  32x32,  the  trace 
is  equal  to  32.  Hence,  the  second  moment  becomes 

Second  Moment  =  32x32  =  1024 

We  have  256  of  these  matrices  resulting  in  the  second  moment  of  the 
whole  512x512  texture  to  be 

Total  Second  Moment  ~  256x1024  =  262144 

Therefore,  we  can  conclude  that  all  texture  fields  have  the  same  mean 
(0.0)  and  variance  (262144.0). 

Unsupervised  Learning  with  Natural  Texture 

It  has  been  found  out  that  among  pairs  of  grass,  raffia,  sand, 
and  wool  textures,  the  highest  B-distance  belongs  to  the  raffia-wool 
pair  [6].  In  this  set  of  experiments,  16  sample  windows  of  wool  are 
imbedded  onto  the  raffia  image  and  vice  versa.  Then,  the  B-distances 
are  computed  as  in  the  simulated  case.  Figure  3  displays  raffia,  and 
wool  and  their  imbedded  combinations. 

NOTATIONAL  CONVENTION  2: 


Raffia/wool  means  16  samples  of  wool  are  imbedded  onto  raffia 


Figure  2.  Imbedded  Simulated  Textures. 
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Natural  textures  and  their 
imbedded  version. 
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"0.5"  * 
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0.1716 

"0.5"  ♦ 

"0.6" 

♦ 

1. 5860 

"0.5"  » 

"0.6/0. 5" 
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Table  1.  B-distances  for  various  combinatorial  pairs 
of  simulated  textures,  "0.5",  "0.5/0. 6", 
"0.6",  and  "0.6/0. 5". 
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Table  2.  B-distances  for  various  combinatorial  pairs 
of  simulated  textures,  "0.5",  "0. 5/0.7", 
"0.7",  and  "0.7/0. 5". 
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background.  Upon  extracting  sample  windows,  this  embedment  is  equal 
to  75%  raffia  +  25%  wool. 


Table  4  presents  B-distances  for  various  possible  pairs. 

REMARK:  Each  texture  field  is  real  picture  with  zero  mean,  unit 
variance,  and  gaussian  histogram. 

Analysis 

The  analysis  is  presented  for  Table  _3  for  artificial  textures  and 
Table  £  for  natural  ones.  Table  £  is  chosen  because  it  provides 
higher  B-distances;  and  "0.5/0. 9"  and  "0.9/0. 5"  are  easy  to  visualize. 
However,  the  same  analysis  can  be  given  for  Tables  1  and  2. 

The  B-distance  between  "0.5"  and  "0.5"  is  obviously  zero.  Table 
3  shows  that  B-distance  between  "0.5"  and  "0.9"  is  47.1449  which  is 
considered  to  be  enormous.  The  same  table  displays  B-distance  between 
”0.5"  and  "0.5/0. 9"  (i.e.  75%  "0.5"  +  25%  "0.9")  is  1.5284. 
B-distance  between  "0.5"  and  "0.9/0. 5"  (i.e.  25%  "0.5"  +  75%  "0.9") 
is  2.5166.  The  drop  from  47.1449  to  2.5166  is  clearly  an  indication 
that  a  great  change  in  B-distance  happens  when  75%  of  "0.9"  is  used 
instead  of  100%.  The  drop  is  47.1449-2.5166=44.6283  and  is  so  great 
that  the  bound  of  error  in  telling  it  is  very  close  to  zero. 
Likewise,  when  "0.5"  is  taken  with  100%  of  "0.5”  the  6-distance  is 
zero;  but  when  "0.5"  is  taken  with  75%  "0.5"  +  25%  "0.9",  B-distance 
increases  to  1.5284.  The  classification  error  bound  for  the  latter  is 
8%.  This  means  that  the  two  textures  which  used  to  be  the  same  can 
now  be  separated  with  a  good  probability  of  92%  due  to  existence  of  a 
foreign  texture  in  one  of  them. 

This  is  an  indication  of  the  usefulness  of  singular  value 
decomposition  for  non-supervised  learning  and  classification.  Table  4 
displays  that  B-distance  between  raffia  and  100%  wool  is  11.8746.  The 
distance  drops  sharply  to  1.6211  when  75%  wool  +  25%  raffia  is 
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Table 


Table 


PAIR 

OF  T£*R!RE3 


t  R  -  DISTANCE 

r - 

»  6<*  3  A  y°L  20 


"0.5" 

"0.5" 


*  "0.5/0. 9"  • 


1. 5284 
47.1449 


t  «o.9" 


"0.5"  f  "0.9/0. 5"  * 
"0.5/0. 9"’  "0.9"  * 


2. 5166 
1.7812 


"0. 5' 0. 9" »  "0.9/0. 5"  * 

- 1 


0.3439 

0.9081 


"0.  9" 


*  "0.9/0. 5"* 


3.  B-distances  for  various  pairs  of  simulated 

textures  "0.5",  "0.5/0. 9",  "0.9",  and  "0.9/0. 5" 


PAIR 

OF  r£<TURE> 


r  -  oiorANr^ 
h  R  3  A  l*P  L  r  ~ 


♦  Raffia 

f - 


T  Raffia/Wool* 


0.7128 

11.8746 


f  Raffia 


Wool 


*  Raffia 


*  Wool/Raffia! 


1.6211 

1.8350 


tRaffia/Wool  ’  Wool 


’Raffia/Wool  T  Wool/Raffia 

- 

f  Wool  f  Wool/Raffia 

f  - - - - 


0.5033 

0.9625 


4.  B-distances  for  various  pairs  of  simulated 
textures.  Raffia,  Raffia/Wool,  Wool  and 
Wool/Raffia . 


36 


employed  instead  of  100%  wool.  The  foreign  texture  (i.e.  raffia) 
introduced  in  wool  causes  that  drop  in  the  B-distance.  This  is  also 
clearly  an  indication  that  singular  value  decomposition  can  be  used  as 
a  means  for  non-supe rv ised  learning  for  natural  textures. 

As  it  can  be  observed  from  figure  2(a)  and  (b),  the  embedments  of 
"0.5/0. 6"  and  "0.6/0. 5"  are  barely  noticeable  by  human  eye.  However, 
Table  1  displays  that,  through  the  application  of  the  technique 
presented  in  this  report,  these  embedments  can  automatically  be 
distinguished  by  computer. 
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2.4  A  Best-Fit  Model  Approach  to  Markov  Texture  Synthesis 
D.D.  Garber  and  A. A.  Sawchuk 


Introduction 


A  method  of  generating  texture  simulations  according  to  their  Nth 
order  densities  was  investigated  for  binary  textures  in  an  earlier 
paper  [1].  The  simulations  resulting  from  this  ma rkov  process 
resembled  quite  closely  their  parental  textures  in  most  cases.  When 
applying  a  similar  concept  to  multi-grey  level  imagery,  the  limits  of 
computer  storage  are  soon  reached.  To  circumvent  this  constraint,  a 
new  method  of  texture  synthesis  was  developed  and  applied  to  a  small 
number  of  textures.  The  results  to  present  are  given  in  this  paper. 

N-grams  in  Continuous  Imagery 

In  the  binary  texture  generation  based  on  N-grams  a  single  value 
P (VN+1/V1' • • ' ' VN}  was  stored  £or  each  possible  pattern  (V1,V2,.  -»VN) 
where  Vi=0,l  in  the  binary  case.  This  value  represented  the 
conditional  probability  that  the  next  pixel,  VN+^,  in  tlie  generation 
process  would  be  a  0  or  black  pixel.  The  V^'s  were  chosen  by  a  best 
linear  model  fit  detailed  in  [2]  and  therefore  the  kernel  of  previous 
pixels  (Vj...VN)  is  not  necessarily  contiguous  (see  Figure  1). 
Details  concerning  the  estimation  of  P  (vn+1//v  1  ’  '  *  *  '  VN^  from  a  parent 
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V1  V2 


V  V 
N  N+l 


V3  V5 


Figure  la.  Two-dimensional  synthesis 
(contiguous  kernel). 


Figure  lb.  Two-dimensional  synthesis 
(non-contiguous  kernel) . 


Figure  2.  Passing  kernel  over 
parent  texture. 


texture  are  given  in  [1].  This  single  value  is  sufficient  in  a  binary 

case  to  define  the  distribution  of  V„+2  given  Vj.,...,VN.  The  number 

N 

of  these  values  which  we  are  required  to  store  is  2  .  In  the 
generation  process  each  pixel  is  generated  based  on  the  values  of 

the  pixels  V^,...,V^  surrounding  it  and  on  a  computer-generated 
uniformly-distributed  random  variable.  The  texture  simulations  are 
generated  pixel  by  pixel  along  a  row  until  each  row  is  complete. 
Pixel  generation  along  the  edges  of  an  image  can  be  handled  in  a 
variety  of  ways  but  are  normally  assumed  to  be  any  random  value,  0  or 
1,  if  they  are  outside  the  image  boundaries. 


A  similar  approach  could  be  used  to  generate  multi-grey-level 

N+l 

textures.  For  a  grey-level  texture,  G  values  of  P  (vN+i  /vi '  *  •  -  *  ) 

must  be  stored.  (Actually  only  (G-1)*GN  are  required  as 
G-l 

Z  P  (X/V,  ,  . .  .  ,  V.J  =1  for  all  V.)*  Storage  limitations  are  soon 

X=0  IN  i 

reached.  Also  estimation  of  P (VN+^/V^ » • • • > VN)  is  difficult  as 
multiple  occurrences  of  the  pixel  pattern  V^,...,V  may  not  exist  in 
the  parent  texture.  Therefore  even  without  storage  limitations  the 
problems  of  estimating  P.(VN+^/V^ ,  .  .  .  ,  VN)  ,  which  represents  the 
distribution  of  given  the  values  of  V^,...,VN  is  complex. 


This  estimation  problem  no  doubt  has  a  number  of  ad  hoc 
solutions.  The  problem  is  basically  that  for  large  N  and  or  large  G, 
there  may  not  be  a  suitable  number  of  occurrences  of  the  pattern 

values  of  \P  to  adequately  estimate  the 
given  a  finite  sample  size.  Even 


V  , . . . , V  for  certain 
1  N 

distribution  P (V  /V  ,...,V  ) 

N+l  1  N 

though  a  certain  pattern  never  occurs  or  rarely  occurs  in  our  sample 

parent  texture  it  is  not  implied  that  such  a  pattern  is  impossible  and 

will  never  occur  in  our  simulation  synthesis.  We  might  often  find 

numerous  occurrences  of  this  pattern  if  our  sample  size  or  the  size  of 

our  parent  texture  was  increased,  especially  in  noisy  and 

fine-structured  textures.  But  as  this  very  large  sample  may  not  be 

present,  we  must  estimate  P(V  /V  ,...,V  )  for  all  V  ,...,V  based  on 

N+l  IN  IN 

available  samples. 


One  approach  would  be  to  use  sample  patterns  which  closely 
resemble  but  which  may  not  be  exactly  the  same  as  each  pattern 
(Vp...,VN).  That  is  in  a  pictorial  sense,  we  use  patterns  of 
(Vp...,VN)  which  look  "close  to"  the  pattern  for  which  we  are 
attempting  to  estimate  P (V N+^/V , V N) .  Therefore  samples  in  our 
sample  parent  texture  may  be  used  to  estimate  numerous 
P (VN+l/Vi» • • • • VN)  and  not  just  those  they  fit  exactly.  But  the 
concept  of  "close  to"  must  be  first  numerically  defined.  Given  two 
patterns,  one  from  our  sample  texture  and  the  other  from  the 
conditional  probability  we  are  attempting  to  estimate,  a  distance 
measure  used  to  define  "close  to"  could  be  used  to  determine  the  value 
of  that  sample  in  estimating  p (vn+]/V 1' *  * • * VN) •  If  the  fit  between 
the  kernel  pattern  and  the  pattern  of  interest  is  good  the  associated 
value  of  V 


P(VN+1/V1" 


N+l 

'V- 


in  the  parent  texture  will  be  valuable  in  estimating 


Normally,  when  N  and  G  are  small  or  we  have  a  large  number  of 
samples  and  numerous  samples  for  any  given  V^,...,VN  occur,  we  would 
merely  use  the  histogram  of  the  associated  VN+1  to  estimate 
P  (VN+l/Vl '  *  *  •  '  Vfj )  •  Here  the  relative  number  of  times  a  particular 
value  of  vN+i  occurs  given  a  pattern  indicates  the  conditional 
probability  we  are  attempting  to  estimate.  Where  a  distance  measure 
is  used  instead,  a  good  fit  could  be  considered  to  be  synonymous  with 
high  frequency  of  that  pattern  and  a  poor  fit  with  low  frequency. 

If  such  a  method  of  estimating  these  conditional  probabilities  is 
used  we  are  still  faced  with  a  huge  storage  problem.  To  be  practical, 
the  storage  requirement  must  be  reduced.  From  an  information 
standpoint,  it  is  interesting  to  note  that  a  method  of  estimating 
N-grams  or  conditional  probabilities  P (vn  +  i ^V1 ' ’ *  * ' Vn )  from  a  sample 
parent  texture  image  produces  GN+^  data  values  from  M2  pixel  samples 
where  M  is  the  size  of  the  square  parent  texture  image  in  pixels.  For 
large  G  and  N  this  is  a  drastic  increase  in  data.  But  the  actual 
information  content  can  really  never  be  greater  than  that  content  of 
the  sample  parent  texture  image.  Therefore,  this  M  value  represents 
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an  upper  bound  on  the  amount  of  data  we  should  use  to  generate  a 
texture  simulation.  Any  amount  of  data  exceeding  this  will  contain 
redundant,  useless  data. 

The  Synthesis  Method 

Combining  this  concept  of  upper  bound  with  the  idea  of  forming  a 
distance  measure  to  compare  two  texture  kernel  values  leads  to  a  new 
texture  synthesis  method.  In  this  method,  we  can  generate  the  next 
pixel  based  on  the  pixels  in  the  kernel  surrounding  it  (see  Figure  1) 
and  their  comparison  to  similar  kernels  in  the  parent  texture.  This 
comparison  can  be  accomplished  by  passing  the  kernel  currently  present 
in  the  simulation  process,  over  the  parent  texture  and  computing  the 
distance  function  at  all  possible  points  (see  Figure  2) .  Denoting  the 
pixels  in  the  parent  texture  by  ,  i  ,  j=0 ,  .  .  .  ,M-1  and  the  pixels  in 
the  kernel  V^,...,VN  by  Y^  y  we  can  compute  a  comparison  image 

Ca,b=  C0MPARIS0N(Xi+a,j+b'Yi,j)  (1) 

for  all  a,b  such  that  the  kernel  is  within  the  boundaries  of  the 
texture . 


One  possible  comparison  function  would  be  correlation.  Assuming, 
without  loss  of  generality,  that  our  kernel  is  contiguous  as  in  Figure 
3  and  the  elements  are  denoted  as  Y. .,  this  function  would  be  defined 
as 


E  E  * 


r  , 
a ,  b 


i  _ 3 


.  Y  .  .  ■ 

a+i,b+j  i,g 


[?  ?  Vi,.*;]  [£  £  ...] 
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The  problem  with  this  particular  distance  measure  is  quite  serious. 
Correlation  does  not  take  into  account  differences  in  over-all  mean. 
For  example,  the  kernels  in  Figure  3  are  perfectly  correlated  but 
their  means  differ  significantly. 

A  better  choice  would  be  the  mean  square  difference  (MSD) .  This 
may  be  defined  as 


“Vb  '  ?  ^  (Xi«.j+b-Il,j>2  13) 

where  i,j  must  be  within  the  coordinate  range  of  the  kernel  as  in 
Eq.  (1).  This  particular  comparison  function  weights  the  comparison 
of  all  elements  in  a  kernel  equally.  Having  studied  many  texture 
generation  models  we  immediately  recognize  that  this  fit  is  not 
properly  weighted.  The  few  pixels  which  are  closest  to  Ynext  in 
proximity  are  far  more  important  when  predicting  Ynext  than  those 
which  are  far  away.  So  Eq.  (2)  may  be  modified  to  give  the  weighted 
mean-square  difference  (WMSD) 


WMSD 


a,b 


£  £ 

i  j 


(Xi+a, j+b"Yi 


. )  -W 


(4) 


A  choice  of  W  would  be 
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Perfectly  correlated  kernels. 
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where  R  is  the  euclidean  distance  between  pixel  Y.  .and  the  kernel  eye 

1  >  j 

^ next  and  coolfd'-nates  °£  the  eYe  are  9iven  by  ( *  next'  j  next^  * 

As  the  first  step  in  comparing  a  given  kernel  Y^  s  to  all  kernels 

-1  t  J 

in  the  parent  texture,  for  each  point  (a,b)  in  the  parent  texture, 

ignoring  edges,  the  WMSD  is  computed  resulting  in  an  image  of  WMSD's. 

Where  the  fit  between  the  generated  kernel  Y.  .  and  the  image  Xj  ■  is 

1  •  J  ■L  r  J 

good,  we  would  expect  WMSD^^  to  be  small.  The  smallest  WMSD 
represents  the  "best"  fit  according  to  our  norm.  We  could  choose  the 
^next  associatec'  with  this  best  fit  at  point  (a,b)  to  be  our  next 
pixel  in  the  generation  process,  however  this  can  cause  problems. 
First  of  all,  the  generation  process  would  "lock  in"  on  the  parent 
texture  and  the  generated  texture  could  very  well  become  just  an  exact 
copy  of  the  input  parent  texture.  Second,  we  know  ideally  that  Ynext 
has  a  distribution,  not  just  a  mean.  In  the  autoregressive  model  [3] 
we  gave  Ynext  a  distribution  by  adding  random  noise  to  it.  We  could 
do  that  here.  However  we  are  failing  to  use  additional  information  in 
the  WMSD  image.  There  maybe  a  set  of  points  (a,b),  all  exhibiting  a 
good  fit  to  the  kernel  pattern  Y^  j.  In  fact,  the  best  fit  may  have  a 
noisy  ynext  and  the  other  good  fits  could  provide  information  to 
improve  the  prediction  of  the  Ynext  in  the  generation  process.  Using 
a  set  of  best  fits  is  equivalent  to  increasing  our  sample  size.  We 
look  at  a  set  of  similar  patterns  to  pick  our  Ynext. 

At  this  point  there  are  numerous  ways  to  proceed.  Logically 

those  patterns  with  the  "best"  fit  should  provide  better  estimators 

for  Y  so  some  kind  of  weighting  decision  is  needed  to  choose  the 

next 

relative  importance  of  the  WMSD's  found.  If  we  search  through  the 
WMSD  image  and  find  the  minimum  value,  WMSD^^,  and  scale  all  the 
WMSD's  by  that  we  form  a  new  image  MAXI 
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MAX1a,b 


WMSD 

WMSD 


min 

a,b 


(6) 


This  image  has  the  value  1.0  at  the  best  fit  point  and  values 
0<MAX1<1 . 0  elsewhere. 


Here  we  can  look  at  the  MAXI (a, b)  image  and  study  its  range.  If 

0 . 16<MAXl£l . 0  it  is  implied  that  the  worst  fit  gives  a  0.16  MAXl(a,b). 

Somehow  that  worst  fit  should  be  translated  to  imply  that  the 

probability  of  choosing  the  ^next  associated  with  that  point 

(a,b)  is  nearly  0.0.  The  simplest  way  of  doing  that  is  to  take 

worst 

powers  of  the  image  MAXI (a, b) .  The  maximum  remains  1.0  while  smaller 
numbers  approach  0.0.  For  example  (1. 0)^=1. 0  but  (0 . 1 6 ) ^ =1 xl 0-8 . 
We  do  this  to  obtain  some  kind  of  estimate  of  p(*next/Yi  j)*  After 
studying  the  values  of  MAXI (a, b)  and  its  powers,  the  value  of  16  was 
chosen  and  a  new  image  PDFUNS 

PDFUNS  .  =  (MAXI  i  ) 16  (7) 

a  t  D  a  t  D 


was  chosen  to  estimate  the  probability 
PDFUNS  can  be  scaled  so  that  I  jC 
distributed  random  variable,  r,  [0,1] 
is  found  such  that 


density  function  P(Ynext/'“'i  j)* 
PDFUNS (a , b) =1 .  Then  a  uniformly 
is  generated  and  a  point  (c,d) 
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The  Y  associated  with  the  kernel  shape  at  (c,d)  is  then  used  as 

next 
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the  next  pixel  in  the  generated  image, 
a  full  texture  image  is  generated  with 
pixel  at  each  step,  row  by  row. 

Results 

For  a  kernel  containing  55  pixels  (see  Figure  4)  passed  over  a 

128x128  parent  texture  approximately  7.2x10^  operations  (additions  or 

subtractions)  are  needed  to  get  the  WMSD  image  defined  by  Eq.  (4). 
5  a ,  b 

Another  2.6x10  are  required  to  find  the  next  pixel  according  to 

Eq.  (8).  therefore,  to  generate  a  512x512  texture  requires  only 
12 

1.96x10  (2  trillion)  operations. 

Results  from  texture  synthesis  done  by  this  method  ate  shown  in 
Figure  5  through  9.  Each  of  these  images  is  512x512  pixels.  A 
128x128  section  of  each  original  (parent)  texture  was  used  for  the 
simulation.  Bark  exhibits  very  large  macro  structure  and  this  is  lost 
in  the  simulation.  A  similar  thing  happens  with  raffia  as  the  kernel 
size  is  smaller  than  the  cell  size  of  the  original  texture  but  is  not 
as  pronounced.  The  top  part  of  the  bubbles  texture  was  generated 
using  a  128x128  portion  different  than  that  of  the  bottom  part.  For 
this  reason  the  top  20-30%  of  the  texture  looks  different  from  the 
rest.  The  large  number  of  operation  makes  this  process  very  time 
consuming  even  where  a  pipelined  processor  is  dedicated  to  the  task. 
About  5.5  days  of  dedicated  time  on  an  AP120B  are  required  to  generate 
each  texture. 

This  method  is  of  little  use  at  present  due  to  the  computational 
complexity  of  the  algorithm  but  a  few  points  should  be  brought  out. 
With  constantly  increasing  computer  processing  speeds,  a  simplified 
version  of  this  texture  simulation  method  may  be  implemented  in  the 
near  future.  It  is  even  possible  that  such  computations  could  be 
performed  by  a  properly  constructed  architecture  of  micro  processors. 
In  any  case  such  brute-force  approaches  are  in  many  aspects  simple  and 
could  be  made  cost-effective  in  the  future.  The  results  also  indicate 


The  process  is  continued  until 
the  kernel  window  moving  one 
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Figure  5b.  Simulated  bark. 
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visually  the  amount  of  texture  information  present  in  a  55  pixel 
window  (see  Figure  4)  as  at  each  pixel  generation  step,  the  next  pixel 
in  the  Markov  process  depends  on  only  the  pixels  in  this  neighborhood. 

Finally,  this  approach  is  admittedly  ad  hoc.  Numerous  distance 
measures  could  replace  the  one  chosen  in  this  work  and  each  would  give 
different,  either  better  or  worse,  results.  It  is  always  important 
that  the  process  be  random  and  not  merely  copy  the  texture  sample. 
This  type  of  non-random  process  will  generate  patterns  quickly 
observable  if  the  simulation  is  much  larger  than  the  original.  In 
other  words,  the  histogram  represented  by  P  ^vn+1//V1  '  '  *  ‘  '  VN^  should 
rarely  be  a  delta  function.  A  reduction  in  the  number  of  computations 
could  be  made  if  the  kernel  was  non-contiguous.  Also,  better  results 
could  probably  also  be  obtained  if  the  kernel  window  was  larger.  The 
shape,  contiguity  and  size  of  the  kernel  in  this  study  was  chosen 

primarily  for  computer  programming  considerations. 

Conclusion 

The  results  from  this  best-fit  texture  synthesis  method  are  very 
pleasing  but  the  number  of  computations  required  is  large.  Other 

similar  algorithms  could  be  developed  which  are  simpler  and  could 

possibly  produce  even  better  results.  With  the  decrease  in 

computation  costs  and  the  increase  in  processor  speeds  of  future 
computers,  such  texture  synthesis  methods  could  be  easily  implemented 
in  the  future. 
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Introduction 

In  earlier  work,  a  second-order  linear  texture  synthesis  model 
was  proposed.  In  this  paper  additional  results  of  simulations  using 
that  model  are  given.  Also,  some  results  of  simulations  using 
non-gaussian ,  non-normal  noise  are  presented.  Synthesis  methods 
designed  to  generate  textures  regardless  of  magnification  are 
developed.  Preliminary  results  on  use  of  multiple  linear  models  for 
texture  synthesis  and  the  use  of  the  covariance  matrices  to  segment 
textured  images  are  briefly  discussed.  A  method  for  making  textures 
non-stationa ry  through  post-processing  is  also  examined. 

Second-Order  Ma rkov 

First-order  linear  autoregressive  models  were  among  the  first 
approaches  to  texture  synthesis  [1,2].  The  results  were  appealing  and 
it  was  believed  that  further  improvements  could  be  made  by  expanding 
to  a  second-order  model  [3].  Initial  results  showed  only  a  slight 
visible  improvement  on  the  texture  sand  (see  Figure  2).  It  was  felt 
that  additional  work  should  be  done  on  other  textures  to  determine  the 
improvement  due  to  the  second-order  model  and  that  texture  models 
employing  non-gaussian  and  possibly  non-stat i ona ry  noise  be 
investigated  based  on  the  results  in  [3]. 
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(a)  Original  SAND. 


(b)  Simulated  SAND  with 
cross-products . 


(c)  Simulated  SAND  with 

non-stationary  noise. 


Figure  2. 
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In  the  linear  autoregressive  model,  each  pixel,  VN+^'  in  the 
synthesized  texture  is  produced  by  computing  a  linear  combination  of 
the  pixels  in  the  kernel  surrounding  it  (see  Figure  1)  plus  noise 
c .  That  is 


VN+1  "  ^lVl+02V2+--'+WBO+£-  (1) 

The  noise  e  was  assumed  to  be  independent,  stationary  gaussian  noise 
with  zero  mean  and  fixed  variance.  A  linear  second-order  model  is 
formed  by  adding  all  possible  cross-product  terms  and  the  model 
becomes 


VN+1  -  6iVB2V---*W-VeiiVei2  viV----,8nNVe 


N  N  N 

■  £«iV  S  S  Biivivj+Bo+£ 


(2) 


i=l  j=l 


Adding  second-order  terms  to  a  model  will  always  produce  a  fit  as  good 
as  or  better  than  a  first-order  model  but  the  number  of  computations 
required  to  compute  the  coefficients  and  fit  the  model  are  much 
g  reater . 

Inside  a  circular  radius  of  14  pixels  from  V  ,  there  are  307 
pixels.  To  search  all  possible  cross  products  in  this  region  to  find 
the  most  significant  would  require  over  47,000  cross  products  to  be 
examined.  Computation  of  a  covariance  matrix  containing  all  of  these 
terms  is  impossible  (in  practice).  In  our  study  we  were  limited  to 
investigate  only  820  possible  cross  products  for  entry  into  the 
generation  model.  As  most  of  the  variance  was  explained  by  the  linear 
terms  of  the  model,  most  of  the  cross  products  were  insignificant  from 
a  statistical  point  of  view.  This  selection  procedure  is  detailed  in 
[2],  Those  that  were  significant  were  entered  into  the  model  and  a 
new  texture  was  generated  using  Eq .  (2)  with  stationary  gaussian  noise 
and  with  zero  mean  and  fixed  variance.  The  results  are  shown  in 
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Figures  2  through  5. 


Applying  this  model  to  the  original  image  data  gives  a  residual 
error  image.  The  distribution  of  this  error  and  the  relationships 
between  the  predicted  and  actual  pixel  values  can  be  studied.  Based 
on  this  work  a  method  of  generating  textures  using  non-stationa ry 
noise  was  developed.  A  histogram  of  error  as  a  function  of  predicted 
value,  V  can  be  formed.  This,  in  turn,  can  be  used  to  generate 
errors,  e,  during  the  synthesis  process.  That  is,  at  each  pixel  a 
VN+1  *s  Predicted  according  to  Eq.  (2)  excluding  the  error  term  and 
associated  with  that  predicted  value  is  a  distribution  of  error  (which 
is  likely  to  be  non-normal  and  even  non-zero  in  mean)  from  which  a 
random  error  value  can  be  chosen  to  be  added  to  VN+^.  The  next  pixel 
will  then  be  computed  in  a  similar  manner.  Results  of  texture 
synthesis  formed  using  this  model  are  shown  in  Figures  2-5. 

In  most  cases,  considerable  improvement  is  seen  when  these 
simulations  are  critically  observed.  Of  course,  the  information 
required  to  generate  them  is  considerably  greater  also.  The 
distribution  of  errors  as  a  function  of  VN+^  must  be  condensed  and 
coded  to  some  degree  to  minimize  storage  requirements.  For  a 
256-grey-level  image  VN+^  can  range  from  -50  to  305  and  the  errors 
from  -255  to  +255.  This  would  yield  quite  a  large  amount  of  data  if 
fully  stored.  By  storing  a  small  number  (under  100),  typical  errors 
for  each  range  (and  not  each  single  value  >  0f  VN+1  the  number  of  data 
values  we  are  required  to  store  can  possibly  be  reduced  to  under  1000. 
More  experimentation  on  these  coding  techniques  remains  to  be  done. 

Sk ip-Generate  Method 

Simulating  textures  which  have  a  fine  structure  is  a  much  easier 
process  than  simulating  textures  with  coarse  structure.  This  is 
because  the  linear  model  will  contain  fewer  terms  if  the  texture 
pixels  become  uncorrelated  over  a  small  distance.  For  the  same 
texture  at  a  greater  magnification,  the  pixels  become  highly 
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(a)  Original  BARK. 


(b)  Simulated  BARK  with 
cross-products . 


(c)  Simulated  BARK  with 

non-stationary  noise. 


Figure  3. 
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(a)  Original  RAFFIA. 


(b)  Simulated  RAFFIA  with 
cross-products . 


(c)  Simulated  RAFFIA  with 
non-stationary  noise. 


Figure  5. 
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correlated  and  the  linear  model  will  be  forced  to  contain  more  terms. 
As  the  texture  becomes  more  coarse,  more  time-consuming  statistical 
measurements  must  be  taken  on  the  parent  texture  over  larger  windows. 
To  circumvent  this  problem,  a  new  texture  generation  method  must  be 
developed . 

In  the  texture  work  so  far  [1-3],  pixel  VN+1  was  generated  based 
on  pixels  above  or  to  the  left  of  it  (see  Figure  1).  As  was  discussed 
in  [2],  the  kernel  does  not  have  to  be  contiguous.  This  shape  is 
chosen  to  insure  that  the  image  space  of  our  synthesized  texture  was 
filled  during  the  generation  process.  However,  generating  pixels 
along  a  row,  row  by  row  is  not  the  only  way  of  filling  an  image  space. 

Consider  the  non-contiguous  kernel  mask  in  Figure  6.  If  the 
spacing  between  the  pixels  in  this  mask  is  8,  using  the  linear  model 
in  Eq.  (1)  to  generate  the  right-most  pixel  in  the  bottom  row,  we  can 
generate  every  8th  pixel  along  every  8th  row.  At  each  step  the  next 
pixel  is  generated  based  on  the  previously-generated  pixels  around  it 
(ignoring  boundary  conditions).  After  generating  an  image  with  this 
type  of  spacing,  the  pixels  midway  between  the  previously-generated 
pixels  on  each  row  may  be  generated  using  the  mask  in  Figure  7.  In 
this  mask,  the  pixel  with  the  "x"  in  it  denotes  the  next  pixel,  VN+1, 
to  be  generated  according  to  Eq.  (1).  Naturally,  the  linear  model 
used  in  this  step  will  have  different  coefficients  than  the  previous 
one.  It  is  also  interesting  to  note  that  new  pixels  do  not  depend 
only  on  previously  generated  pixels  above  them  (as  with  the  mask  in 
Figure  1)  but  depend  also  on  the  pixels  below  them.  But  still, 
ignoring  boundary  conditions,  each  pixel  depends  only  on  previously 
generated  pixels.  At  the  next  step  a  mask  similar  to  that  in  Figure  8 
can  be  used  to  fill  in  the  pixels  midway  between  the 
previously-generated  pixels  in  each  column.  Again  pixels  are  allowed 
to  depend  on  pixels  around  them. 

By  repeatedly  using  the  masks  in  Figure  7  and  Figure  8  with 
successively  closer  and  closer  pixel  spacing,  the  texture  simulation 
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Figure  6.  First-Pass  Mask.. 
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Figure  7.  Second-Pass  Mask. 
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Figure  8.  Third-Pass  Mask. 
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Figure  9.  Filled  space  of  skip-generate  method. 
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image  space  is  filled.  An  example  showing  the  pixels  generated  at 
each  successive  pass  is  shown  in  Figure  9.  More  importantly,  to 
determine  the  linear  model  for  each  mask,  only  one  covariance  matrix 
is  required  and  can  contain  as  many  or  as  few  terms  as  desired.  The 
process  of  collecting  statistics  for  one  matrix  is  not  beyond  the 
complexity  that  we  would  want  to  undertake  for  the  small  number  of 
times  required  by  this  process.  Naturally,  any  other  markov  process 
may  be  substituted  for  the  one  in  this  linear  model.  As  before,  only 
the  measurements  required  to  estimate  the  parameters  corresponding  to 
each  mask  need  to  be  taken.  This  number  depends  on  the  spacing  of  the 
pixels  in  the  first  mask,  which  should  be  a  power  of  two.  Other 
odd-shaped  kernels  and  kernels  whose  spacing  is  not  a  power  of  two 
could  be  designed  to  form  space- f i 1 1 i ng  sets.  Most  would  require  more 
models  to  be  estimated  and  would  provide  little  additional 
information. 

Texture  simulations  using  this  method  are  shown  in  Figures  10-15. 
Only  a  slight  improvement  is  seen  in  some  of  the  texture  simulations 
over  the  synthesis  done  by  the  earlier  single  linear  model.  Most  of 
these  textures  are  apparently  well  simulated  by  a  carefully  chosen 
model  and  the  results  are  not  critically  dependent  on  the  coarseness 
of  the  textures.  More  study  remains  to  be  done  with  textures  of 
various  magnifications  to  determine  the  types  of  textures  where  this 
skip-generate  method  provides  improvement. 

Multiple  Linear  Model  Method 

When  generating  textures  using  the  general  linear  model  described 
by  Eq.  (1)  and  the  generating  kernel  in  Figure  1  the  same  model  is 
used  regardless  of  the  values  of  the  pixels  V^,...,VN.  By  developing 
more  than  one  linear  model  and  allowing  the  choice  of  the  model  at 
each  pixel  generation  step  in  the  synthesis  process  to  be  dependent  on 
some  functional  value  of  V^, . . . ,VN,  F(V^,...,VN)  a  new  synthesis  model 
is  formed.  To  illustrate  this  concept  consider  the  data  in  Figure 
16a.  If  we  were  to  fit  one  linear  model  to  the  data  it  would  look 
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Figure  13.  Leather 
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Figure  14.  Sand. 
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Figure  15.  Raffia 
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like  the  single  linear  line  running  through  the  data.  This  linear 
model  could  then  be  used  to  predict  V ^  based  on  the  value  of  in  the 
typical  linear  regression  way.  But  if  we  allow  the  choice  of  our 
linear  model  to  be  dependent  on  the  value  of  then  for  an  incoming 
value  of  we  choose  a  model  whose  domain  includes  V^.  For  6  linear 
models,  the  straight  lines  are  shown  in  Figure  16b.  The  fit  to  the 
data  using  multiple  linear  models  will  always  be  as  good  as  or  better 
than  that  of  the  single  linear  model.  That  is,  the  mean  square  error 
will  generally  be  reduced  using  multiple  models. 


Using  multiple  linear  models  for  texture  synthesis  we  would 

generate  pixe  1  Vi  based  on  pixels  V^,...,V  in  the  following  way. 

First,  we  compute  a  function,  F,  of  the  , . . . , pixels  which  allows 

us  to  choose  the  proper  linear  model.  Then  using  this  model  with  the 

values  V  ,...,V  we  predict  V  ,  and  add  noise.  This  process  is 
1  N  N+l 

diagramed  in  Figure  17. 


Ideally,  the  function  F  should  be  chosen  to  minimize  the  total 
mean  square  error  resulting  from  fitting  the  limited  number  of  models 
to  the  sample  data.  This  is  very  difficult  to  do  in  practice  however 
as  for  N  larger  than  3  we  are  fitting  multiple  hyperplanes  to  data  in 
an  N+l  dimensional  space. 


One  texture  synthesis  of  sand  was  done  using  the  multiple  linear 
model  (see  Figure  18).  In  this  case  eight  models  were  used  and  the 
model  number  was  chosen  by  examining  the  pixel  immediately  to  the  left 
of  the  pixel  being  generated.  The  range  of  this  pixel,  0  to  255,  was 
divided  into  8  equal  subranges  and  the  model  was  chosen  according  to 
the  subrange  into  which  the  value  fell.  Only  a  slight  improvement 
over  the  single  linear  model  synthesis  is  seen. 

Another  method  of  using  multiple  markov  models  is  to  generate  an 
image  of  fields  defining  the  model  number  to  be  used  in  a  second  pass. 
Such  an  approach  would  be  useful  in  simulating  textures  which  have 
multiple  sub-textures  within  them.  A  simple  analytical  example  may  be 
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a.  Single  Linear  Model. 


Figure  16b.  Multiple  Linear  Model 
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Figure  17.  Diagram  of  Multiple  Linear  Model. 
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shown  in  Figure  19.  Real  world  examples  might  include  such  things  as 
a  brick  wall  where  the  texture  of  the  bricks  is  different  than  the 
texture  of  the  concrete  separating  them.  It  was  felt  that  this  type 
of  approach  might  be  useful  in  the  simulation  of  bark  which  has  a 
strong  macro  structure.  A  method  to  separate  this  texture  into  two 
fields  which  would  later  define  the  model  to  be  used  was  designed. 
This  result  (Figure  20)  was  obtained  by  successively  passing  smart 
median  filters  of  varying  sizes  over  a  binary  image  (which  was 
obtained  by  thresholding  an  original  continuous  grey-level  image)  (see 
Figure  3a) .  The  smart  median  filters  replace  the  center  pixel  of  a 
window  with  the  median  only  if  certain  conditions  are  met.  In  the 
binary  case,  the  center  pixel  of  a  square  window  is  replaced  if  the 
percentage  of  black  or  white  pixels  is  above  a  specified  threshold. 
The  threshold  and  window  size  vary  in  the  successive  passes  over  the 
image. 

In  our  actual  simulation,  once  this  field  image  is  obtained  a 
method  must  be  developed  to  simulate  this  field  texture  and  this  field 
texture  will  then  in  turn  be  used  to  choose  the  model  numbers  in  the 
generation  of  final  synthesis.  Generating  textures  with  only  a  few 
grey  levels  can  be  done  using  more  complete  markov  statistics,  perhaps 
n-grams,  but  the  large  size  of  the  fields  may  require  that  a  method 
such  as  the  skip-generate  method  be  used.  This  kind  of  work  has  not 
been  done  yet  but  is  planned  in  the  future. 

Generating  Non-Homoqeneous  Textures 

Previous  to  simulation  attempts,  most  textures  in  our  work  are 
preprocessed  by  statistical  differencing  (see  [4]  for  details).  This 
processing  helps  eliminate  non-homogeneities  in  mean  which  are 
primarily  due  to  lighting  effects.  The  process  of  statistical 
differencing  which  removes  these  non-homogenei ties  can  be  reversed  to 
induce  them  back  into  a  synthesized  texture. 


7  Q 


Statistical  differencing  is  defined  by 


G(j,k)  =  [F(j,k)-F(j,k)l 


r. 


Ac 


.1 


j-  TrvM  _  +  f  1  * 


where  m,  and  a,  represent  desired  mean  and  standard  deviation.  F  is 
d  d 

the  input  pixel  at  location  (j,k)  and  G  is  the  output  pixel  in  the 
statistical  differenced  image.  F(j,k)  and  o(j»k)  are  usually 
estimated  from  the  input  image  over  a  window  containing  pixel  (j,k). 


The  inverse  operation  of  statistical  differencing  can  be  called 
local  moment  modification.  Solving  for  F  in  terms  of  G  we  find  the 
formula  for  local  moment  modification  as 


F(j,k)  = 


Ao (j ,k)+od 


rF(j,k)Aod 

G(j'k)  +[ao  (j  ,k)+od 


Ao. 


- [aM ,+ (1-ot)  F  ( j  ,  k) ] 


(4) 


By  generating  an  image  F(j,k)  and  o  (j»k)  and  defining  A,  a.,  md  and  o 
using  Eq.  (4)  we  can  induce  non-homogenei ties  into  our  simulated 

texture . 

One  such  synthesis  where  F(j,k)  was  assumed  to  be  ramp-like  and 
o(j»k)  was  a  constant  is  shown  in  Figure  21.  Other  more  complex  and 

more  random  F(jrk)  and  o  (j,k)  images  could  be  used  to  create  different 

effects . 

Segmentation  Using  Covariance 

In  [ 2 J  details  concerning  the  estimation  of  a  covariance  matrix 

and  linear  model  determination  from  a  sample  texture  are  given.  This 

covariance  and  the  linear  model  itself  may  be  used  to  identify  and 
segment  textures.  A  little-known  method  for  testing  the  equality  of  t 
covariance  matrices  given  assumptions  of  normality  is  derived  using 


72 


the  standard  maximum  likelihood  ratio  approach  in  [5].  The  resulting 
test  for  t=2  covariance  matrices  is  given  by 

_  2  (5) 
2.3026  mM  Xp(p+1)/2 

where 

M  =  (n1+n2-2)  loglQ  jS  |  -  (n^^-1)  log1Q  jS^  1  -  (n2-l)  loglQ  |S2  |  , 

S ^  and  S2  are  the  covariance  matrices  for  each  group, 

S  =  [ (n1-l)S1+(n2-l)S23/(n1+n2-2) , 

m  =  1_  1  (n^-l)  +  (n^-l )  “  (n;L+n2-2)  ]  [  (2p  +2P~1) /6  (P+1  >  1  * 

and  p  is  the  size  of  the  covariance  matrices  being  tested.  Here  n^ 
and  n2  are  the  sample  sizes  used  to  estimate  and  S2.  The 

derivation  of  Eq.  (5)  is  very  complex  and  will  not  be  detailed  in  this 

paper.  For  fixed  sample  sizes  m  i=  fixed  so  the  primary  distance 

measure  is  M.  This  value  approaches  0  when  the  matrices  are  equal  and 
becomes  large  the  more  they  differ. 

The  composite  texture  shown  in  Figure  22  was  segmented  using  this 
approach.  A  covariance  matrix  was  measured  over  a  48x48  pixel  window 
and  the  16x16  window  in  the  center  of  it  was  compared  to  the  known 

covariance  for  sand,  the  center  texture  in  the  composite.  It  should 
be  recalled  that  sand  is  not  well-described  by  its  covariance  matrix 
as  the  synthesis  of  sand  using  a  linear  model  was  poor.  The 
covariance  matrix  was  16x16  and  described  the  relationships  of  the 
pixels  in  the  pattern  shown  in  Figure  24.  Statistical  measurements 
taken  over  a  maximum  pixel  separation  of  6  were  used  to  produce  a 

distance  measure.  A  scaled  version  of  the  output  of  the  distance 
measure  M  is  shown  in  Figure  23.  As  was  expected,  the  sand  region  is 
picked  out  correctly. 
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Figure  22.  Composite  Texture. 


Figure  23.  Segmented  Composite 


Figure  24.  Covariance  Matrix  Kernel. 


There  is  no  doubt  that  better  methods  of  segmenting  and 
identifying  textures  exist.  The  assumptions  of  normality  and 
independent  samples  on  which  this  test  for  equality  of  matrices 
depends  are  clearly  violated.  This  may  account  for  the  poor 
classification  results.  Better  results  will  be  obtained  when  going  to 
larger  matrices  perhaps  but  the  computational  complexity  likewise 
increases  drastically. 

Conclusion 

New  models  for  texture  synthesis  were  described  in  this  paper 
along  with  results.  It  is  felt  that  many  will  be  useful  in  the  future 
to  do  texture  simulations.  A  texture  identif ication  and  segmentation 
method  was  presented  with  preliminary  results. 
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2.6  Runway  Detection  in  Aerial  Images  of  Airports 


K.R.  Babu 


Introduction 


In  this  report,  the  problem  of  combining  collinear  antiparallels 
[1]  in  order  to  detect  elongated  objects,  such  as  roads  and  runways, 
in  an  aerial  image  is  considered.  The  problem  presents  itself  when  an 
aerial  image  is  processed  by  a  general  linear  feature  extraction 
program  such  as  [1]:  the  program  breaks  the  straight  boundaries  into 
several  disjoint  lines. 

An  image  of  an  airport  area  is  used  as  an  example  to  illustrate 
the  algorithm's  effectiveness.  Fig.  1(d)  shows  the  results  obtained 
by  processing  the  antiparallel  data  of  Fig.  1(c)  with  the  algorithm  to 
be  presented.  The  lines  represent  the  central  axes  of  the  runways. 


Important  Considerations 


It  is  obvious  from  the  nature  of  the  problem,  that  we  have  to 
produce  a  program  that  will  compute,  or  group  together,  collinear 
antiparallels  in  their  correct  spatial  order.  We  now  present  three 
different  solutions  towards  this  end;  each  of  them  differs  from  the 
others  in  the  way  in  which  the  antiparallels  are  treated.  This,  in 
turn,  affects  the  quality  of  the  solution  which  can  be  judged  on  the 
following  three  criteria: 


1.  The  time  complexity  of  the  algorithm.  (Ideally,  the  entire 
algorithm  should  be  analyzed;  however,  it  is  not  always  easy  to 
do  so  analytically;  thus,  a  significant  portion  of  the  algorithm, 
viz.,  only  th°  number  of  antiparallel  pairs  considered  for 
collinearity  evaluation,  is  considered  in  this  presentation). 


2.  Extraneous  antiparallels  that  do  not  represent  elongated  objects. 
Certain  antiparallels  have  to  be  discarded  because  both  of  the 
line  segments  of  such  antiparallels  do  not  have  collinear 
continuity.  In  fig.  2,  the  line  segment  A  does  not  have  a 
collinear  counterpart.  In  most  such  situations,  the  line 
represents  an  irrelevant  detail  surrounding  an  elongated  object 
being  sought;  thus,  in  fig.  2,  antiparallel  Ab  can  be  discarded 
from  any  further  consideration.  This  problem  of  selecting  or 
discarding  certain  antiparallels  can  be  called  the  anti£arallel 
selection  £roblem  (APSP) . 

3.  Computation  of  superfluous  collinear  antiparallel  pairs.  For 
example,  in  fig.  3,  the  following  set  of  collinear  ant i pa ra 1 lels 
would  be  computed  if  all  antiparallel  pairs  are  compared: 

{  (1,2) ,  (1,3)  ,  (1,4)  ,  (2,3)  ,  (2,4)  ,  (3,4)  }. 

What  is  sufficient  for  characterizing  the  entire  elongated  object 
is,  however,  :s  only 


{  (1,2), (2, 3), (3, 4)}. 


Thus,  additional  processing  to  remove  this  superfluity  is 
requ i r ed . 

The  solutions  assume  that  the  im_ge  has  n  anciparallels. 

An  important  aspect  of  a  program  which  tries  to  locate  collinear 
lines  in  an  image  is  a  procedure  which  returns  a  predicate  indicating 
whether  two  given  lines  are  ( approx ima te 1 y  collinear).  While  many 
variations  are  clearly  possible,  the  following  is  a  SAIL-like  [2] 
description  of  what  has  been  used  to  produce  the  results  of  this 


report: 


1 


Figure  3.  A  hypothetical  elongated  object  in  terms  of  its 
linear  features.  Dotted  lines  represent  axes  of 
antiparallels . 
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define  colltol 


II  II  , 


"  a  suitable  integer  for  collinearity  tolerance" 

simple  boolean  procedure  coll  inear ( integer  dl,  d2); 
begin  "collinear" 

"  dl  and  d2  are  the  angles  of  vectors  being  tested  for 
collinearity.  The  vector  represented  by  d2  follows  that 
represented  by  dl.  " 
boolean  ok;  integer  d; 

ok  ;=  d2  lies  between  dl+colltol  and  dl-colltol; 

if  ok  then 

begin 

d  ;=  angle  made  by  the  vector  joining  the  tail  of  1 
to  the  head  of  2;  "  angle  in  0..359  " 

ok  :=  d  lies  between  dl+colltol  and  dl-colltol  AND 
d  lies  between  dl+colltol  and  dl-colltol; 

end; 

return (ok) ; 
end  "collinear"  ; 


Solution  #1 

A  straightforward  algorithm  is  one  where  pairwise  comparison,  for 
collinearity,  of  all  the  ant i pa ra 1 le 1 s  is  performed.  This  process 
forms  several  collinear  antiparallel  pairs;  some  more  processing  is 
required  to  group  the  (collinear)  a nt i pa ra 1 le 1 s  representing  a  single 
elongated  object.  The  characteristics  of  this  algorithm  are  - 

,  ? 

1.  Complexity  of  comparisons  is  0(n  ). 

2.  APSP  can  be  solved  only  by  conducting  a  seaich  among  collinear 
antiparallel  pairs  (cf.  Solution  #3). 
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3.  A  considerable  number  of  superfluous  collinear  antiparallel  pairs 
is  computed  only  to  be  discarded  later. 

Solution  #2 

Instead  of  comparing  all  antiparallel  pairs  for  possible 
coll inea r i ty ,  it  is  possible  to  sort  the  antiparallels  into  buckets 
representing  their  angular  orientation.  Then,  antiparallels  in 
neighboring  buckets  only  need  be  considered  for  the  collinearity 
comparison.  The  characteristics  of  this  algorithm  are  - 

1.  The  complexity  here  is  not  reduced,  compared  to  that  in  solution 

#1.  If  we  assume  uniform  distribution  of  antiparallels  into 
buckets,  the  number  of  antiparallels  in  each  bucket  is 

proportional  to  n,  and  since  collinears  for  n  antiparallels  have 
to  be  computed,  the  complexity  of  comparisons  is  0(n  ),  possibly 
with  a  reduced  constant. 

2.  Same  as  in  solution  *1. 

3.  Same  as  in  solution  fl . 

Solution  #3 

Most  of  the  inefficiency  in  the  above  two  solutions  is  due  to  the 
consideration,  for  collinearity  checking,  of  antiparallel  pairs  wnirh 
are  not  proximate;  thus,  we  are  led,  in  this  solution,  to  exploring  an 
algorithm  which  will  enable  the  program  to  consider  proximate 
antiparallels  only. 

Such  a  consideration  is  easily  achieved  by  a  point  representation 
of  the  antiparallels  on  an  image  grid  (Fig.  4).  Each  antiparallel  is 
represented  by  two  point  vectors ,  each  positioned  at  one  of  the  two 
endpoints  and  pointing  outward  (of  the  enclosed  rectangle  of  the 
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antiparallel)  and  parallel  to  the  antiparallel.  (In  some  situations, 
two  endpoints  may  have  the  same  image  coordinates;  in  that  case, 
relocating  one  of  them  a  pixel  or  two  away  will  be  sufficient  to 
preserve  their  identity  while,  at  the  same  time,  not  significantly 
affect  the  relative  positioning  of  the  antiparallels) , 

The  essence  of  the  algorithm  is  an  operation  which  scans  each 
potential  neighborhood  for  compatible  point  vectors.  This 
compatibility  of  point  vectors  captures  collinear  antiparallels  and  is 
defined  by  - 

1.  the  point  vectors  representing  the  antiparallels  are 

anticollinear ,  i.e.,  the  angles  differ  by  180  +  at ,  a  tolerance, 

and,  the  point  vectors  (Fig.  4),  and 

2.  the  component  segments  of  the  antiparallels  form  two  collinear 
pairs.  Each  application  of  the  operator  will  result  in  at  most 
one  antiparallel  pair. 

This  operation  over  the  entire  image  produces  a  list  of 
non-redundant  collinear  antiparallel  pairs.  These  pairs  are  then 
examined  to  produce  lists  of  pairs  such  that  if  the  pairs  (a^b^), 
(a^, b^ ) , . . . ,  (an,bn)  are  in  the  list,  then  i*2,...,n-l.  Each 

list,  then,  represents  an  elongated  object  or,  equivalently,  a 
collection  of  collinear  antiparallels. 

The  characteristics  of  this  algorithm  are  - 

1.  Since  the  operation  scans  only  around  a  point  vector 
(representing  an  antiparallel),  and  since  the  maximum  number  of 
point  vectors  that  can  come  under  the  purview  of  the  scan  area  is 
a  constant  (determined  by  the  area  covered  by  the  operation),  the 
number  of  collinearity  comparisons  is  0(n). 

2.  Since  for  each  point  vector,  only  one  compatible  point  vector  can 
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exist,  the  extraneous  antiparallels  are  never  computed. 

3.  Because  of  proximity  considerations  inherent  in  the  operation,  no 
superfluous  collinear  antiparallel  pairs  are  computed. 
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2.7  Shape  Matching  Using  Hierarchical  Gradient  Relaxation  Technique 
B.  Bhanu  and  O.D.  Faugeras 


Introduction 


The  problem  of  shape  matching  mainly  consists  of  two  parts  [1]. 

1)  One  shape  is  recognized  to  be  an  approximate  match  to  another 
shape,  and 

2)  A  piece  of  a  shape  is  recognized  as  an  approximate  match  to  a 
part  of  a  larger  shape.  This  problem  is  also  known  as  the  "segment 
matching"  [2]. 

In  this  paper  we  address  the  "segment  matching"  problem  of  shape 
matching  using  hierarchical  gradient  relaxation  technique. 
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The  class  of  shapes  that  we  consider  are  represented  by  simple 
closed  curves  (no  holes)  and  are  two  dimensional  in  nature  such  as  the 
linear  features  in  satellite  images,  borders  on  maps,  outlines  of 
biological  cells  etc. 

One  of  the  earliest  shape  matching  techniques  is  based  on  chain 
code  cross-correlation  [3],  However,  the  usefulness  of  this  method  as 
a  solution  to  the  segment  matching  problem  is  very  much  limited  by  the 
fact  that  cross-correlation  is  not  rotation  invariant,  it  is  very 
sensitive  to  local  changes  in  the  number  of  chain  links  and  also  quite 
sensitive  to  small  global  changes  in  the  shape.  There  exist  a  large 
number  of  statistical  pattern  recognition  techniques  for  shape 
matching  (4].  Unfortunately,  these  feature  based  approaches  cannot  be 
used  for  segment  matching  problem  because  they  suffer  from  the  fact 
that  the  descriptors  of  a  segment  of  a  shape  do  not  ordinarily  bear 
any  simple  relationship  to  the  descriptors  of  the  entire  shape. 
Another  class  of  shape  matching  techniques  are  syntactic  techniques 
(5],  but  they  have  not  been  applied  to  the  segment  matching  problem. 
Davis  [1,6]  studies  segment  matching  using  discrete  relaxation  methods 
which  carry  strong  syntactic  flavor.  Davis  and  Rosenfeld  [7,8]  use 
iterative  methods  for  recognizing  upright  squares  on  a  noisy 
background  and  hierarchical  relaxation  for  waveform  parsing.  However, 
they  do  not  define  a  hierarchical  relaxation  network  and  study  its 
usefulness  and  computational  properties  [1].  Rutkowski  [9,10] 
considers  the  shape  segmentation  of  closed  boundary  curves  such  as 
aeroplane  using  relaxation  methods.  Kitchen  [11,12]  applies  discrete 
and  fuzzy  logic  approaches  of  relaxation  for  matching  relational 
structures.  In  fact,  relaxation  methods  have  been  applied  to  a  wide 
variety  of  problems  in  pattern  recognition,  scene  analysis  and 
artificial  intelligence.  A  good  survey  of  these  methods  is  given  in 
[13].  Rosenfeld,  Hummel  and  Zucker  [14]  introduced  the  idea  of 
probabilistic  relaxation  and  Faugeras  and  Berthod  [15]  reformulated 
the  probabilistic  relaxation  problem  by  explicitly  minimizing  a 
criterion  function.  In  this  paper  we  follow  this  optimization 
approach  of  relaxation,  called  the  gradient  relaxation  method.  This 


method  has  been  applied  to  segmentation,  semantic  description  of 
aerial  pictures,  edge  detection  etc.  [15,16,17].  We  maximize  a 
criterion  function  using  projection  gradient  method  and  solve  the 
shape  matching  problem  in  a  hierarchical  manner. 

In  this  contribution  we  consider  only  two  levels  of  hierarchy  and 
show  that  how  the  method  can  be  generalized.  Various  ways  of 
computing  the  initial  probabilities  and  compatibilities  are  described. 
Interaction  of  the  two  levels  of  relaxation  is  explained  and 
strategies  that  lead  to  faster  computation  are  introduced.  Finally, 
an  example  is  discussed  in  detail. 


Shape  Matching  Using  Gradient  Relaxation  Technique 


In  this  section  we  present  a  two  stage  hierarchical  gradient 
relaxation  method  (fig.  1)  for  matching  the  segments  of  a  template 
against  the  segments  of  an  object.  Let  0* (0Q , 0^ , . . . , 0L_2)  and 
T=(Tg,T^,..., Tn_-^)  be  the  polygonal  path  representation  of  the  object 
and  the  template  respectively.  The  segments  are  the  sides  of  the 
polygon  and  conventionally  a  polygon  will  be  traced  in  the  clockwise 
sense.  In  general  L  may  be  greater,  equal  or  less  than  N. 


In  the  following  discussion  template  elements  will  be  called  as 
units  and  object  elements  as  classes.  Since  a  unit  may  not  correspond 
to  any  of  the  object  elements,  there  exists  a  ( L-l ) thclass ,  known  as 
the  nil  class  and  denoted  by  0^-1'  So  we  ^ave  N  units  and  L  classes. 
Relaxation  is  an  iterative  process  for  labeling  a  set  of  interrelated 
units.  In  "probabilistic  relaxation"  discussed  below,  a  set  of 
estimates  of  class  assignment  probabilities  is  initially  associated 
with  each  unit.  At  subsequent  iterations,  the  probabilities  are 
adjusted  in  accordance  with  the  support  they  receive  from  the  class 
probabilities  of  related  units. 


To  each  of  the  units  T^ ,  we  assign  a  probability  denoted  by  p^ ( k ) 
to  belong  to  class  0^.  This  is  conveniently  represented  as  a  vector 
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?i=[pi(0),  Pi(l)/ 


,p.(L-2)/  p± (L-l ) 1 


pi(i=0. 


The  set  of  all  vectors 

, N-l )  is  called  a  probabilistic  or  stochastic  labeling  of 

the  set  of  units.  Units  are  related  to  one  another,  set  of  units 

related  to  T.  is  denoted  by  V..  The  units  that  are  related  to  T.  are 
11  i 

T  (  (i-1) )  an<^  T((i+1))  '  w^ere  the  indices  are  taken  as  modulo  N,  the 

number  of  units.  (For  the  sake  of  notational  simplicity,  in  the  rest 

of  the  paper  we  shall  not  use  the  modulo  notation  explicitly).  At  the 

first  stage  of  hierarchy  T^  ^  and  T^+1  will  be  called  as  the  left  and 

right  neighbors  of  the  unit  T^.  At  the  second  stage  of  hierarchy 

T.  , ,  T.,  and  T.  ,  will  be  considered  as  an  entity  in  itself.  The 
i-l  i  i+l 

world  model  is  specified  by  the  compatibility  function  C,  which  in 

9 

general  is  defined  only  over  a  subset  S. c  (NxL)“  for  the  first  stage 

and  S2c  (NxL)  for  the  second  stage  of  hierarchy.  At  the  first  stage 

of  hierarchy  C(T.,0  ,T.,05)  measures  the  adequacy  of  calling  unit  T. 

1  K  j  1 

as  0,  and  unit  T.  as  0„  where  T,  e  V. (=T.  or  T.  ) .  Similarly  at  the 
k  3  *•  31  i-l  i+l 

second  stage  of  hierarchy  CCT^,  0^,  T^_^,  0^  ,  Ti+1,  0j_  )  measures  the 
adequacy  of  calling  unit  T^  as  0^,  unit  T^_^as  an^  unit  T^+^  as 

0^  .  For  each  of  the  units  we  also  define  a  consistency  vector 

qi=(qi(0),  q^.  (1)  , .  .  .  ,qi  (L-2)  ,  qi(L-l)]T  that  tells  us  what  pj_  should 
be  given  p.  at  the  neighboring  related  units  and  the  compatibility 

0,.,  T  j , 


function.  For  simplicity  in  the  sequel  we  shall  denote  C(T^, 


0L)  as  C  (  i  ,  k ,  j  ,  l)  and  C(Ti#  0k,  Ti_1» 


ll’  i2’i2 


l  ' 


Ti+1'  °i. 


)  as  C ( i ,  k ,  i 


1' 


As  described  in  [15],  we  define 


Q  (k) 

qj.(k)  “  L^I 

2  Q.U) 
Z  =  0 


whe  re , 
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(2) 


L-l 

Qi(k)  =  23  23  C(i,k,j,4)p  (£) 

j£Vi  1=0  3 

at  the  first  stage  of  hierarchy.  Similarly  for  the  second  stage  of 
hierarchy  we  obtain, 

L-l  L-l 

Q.  (k>  =  2]  23  c(i,k,i,  ,Z  ,  i-  ,Z?)p.  U,)p.  U2)  (3) 

1  Z  =0  i2=0  1  1  1i  X  2 

with  q^(k)  defined  by  (1).  The  global  criterion  that  measures  the 
consistency  and  ambiguity  of  the  labeling  over  the  set  of  units  is 
given  by 

N-l 

c  =  £  Pi'^i  (4) 

i=0 

we  carry  out  the  maximization  of  (4)  using  the  projection  gradient 
approach.  This  criterion  has  been  successfully  used  in  the 
segmentation  and  semantic  description  of  aerial  images  [16,17].  The 
labeling  problem  now  becomes  to  find  the  local  maximum  of  the 
criterion  c  closest  to  the  original  labeling  p  subject  to  the 
constraint  that  p^'s  are  probability  vectors.  This  problem  can  be 
efficiently  solved  using  steepest  descent  techniques  [15].  The 
gradient  of  the  criterion  c  is  obtained  as. 


90 


L-l 

.  }<•...  =  q.  (k)+  2  fT  2  C(j,t,i,k)  •  (p.  (4)-pi-qi)  for  k=0 , 1, .  .  .  ,L-1 

3Pi(k)  i  jeVi  Uj  £=Q  J  J  J 


L-l 

where  D.  =  2  Q-;  (*•) 

J  1=0  J 


(5) 


at  the  first  stage  of  hierarchy.  The  first  term  in  (5)  corresponds  to 

-►  -*•  < 

the  simple  maximization  of  the  product  p^.  in  the  global  criterion 

c,  and  the  second  term  corresponds  to  the  coupling  between  units 
through  the  comoatibility  function.  Note  that  in  general 
C ( j , l, i ,k) ( i , k , j , &)  since  it  depends  upon  the  manner  in  which  the 
compatibility  is  computed.  More  about  this  is  explained  under  the 
compatibility  computation. 

Similarly,  at  the  second  stage  of  hierarchy  the  gradient  of  the 
criterion  c  is  obtained  as, 


3C 

3p~nr 


-  Vk,*D—  10i  <*>-“•  Pi  -5i  )  *  sf-  iQi  OO-D!  p  -5  1,  l6) 

1  111  2  x2  2  2 


where , 


L-l  L-l 


^l(k)  £.  C(ll'4l'13'Z3'i'k)Pi1(ii)Pi. 


(z3) 


L-l  L-l 


^  ^  C  (  i_  ,  2-2  '  ^4  '  ^4  '  P;  ( A  *  ) 

9  5  =n  o  =n  *■  *•  4  H  ±2  *  14  4 


2  l2= 0  Z4=0 
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Fig.  3. 


and  is  determined  to  have  the  largest  possible  value  such  that  p^'s  at 
the  (n+l)th  iteration  still  lie  in  the  bounded^  convex  region  of  LN 
dimensional  Euclidean  space  defined  by  ,  £.  p.(k)*l  and  p.(k)>0, 

k=0  1  i/_v~ 

i*0,...,N-l.  However,  to  obtain  the  faster  convergence  rate  P^  '  is 
obtained  as 

P.(n)  =  a  •  min  [max  'pfn)  (k)  ] 

K 

where  a  is  a  constant  between  0  and  1  and  can  be  used  to  control  the 
rate  of  convergence.  Also 


l-p|n) (k) 

pn(k)  «  - ,  if  CL  >  0 

p{n) (k) 

-  -Hr-  '  if  ci  <  0 


(10) 


A  side  effect  of  computing  foe  every  unit  is  that  we  nay  not 
be  following  the  gradient  exactly.  However,  it  can  be  expected  that 
we  are  approximately  in  the  direction  of  the  gradient  and  the 
criterion  (4)  is  still  maximized.  Although  the  criterion  generally 
increases  but  sometimes  it  decreases  slightly  because  in  (9)  is 
too  large  for  some  of  the  units  and  then  it  continually  increases. 


Details  of  the  Shape  Matching  Algorithm 

In  this  section  we  study  details  of  the  shape  matching  algorithm 
whose  block  diagram  is  shown  in  fig.  1. 

Polygonal  Approximation 


Polygonal  approximation  for  the  template  and  object  is  found  by 
using  the  algorithm  proposed  by  Rosenfeld  and  Johnston  [18],  wnich 
detects  the  points  of  high  curvature.  The  algorithm  works  as  follows. 


Let  R*{ (X^,Y^) be  the  sequence  of  points  describing  a  closed  curve 
so  that  * (xn»Yn) •  At  every  point  of  the  sequence,  smoothed 
k-curvature  is  evaluated  as, 


=  (a . 


where 


(Xi'Xi+k'VYi+k 


) 


(x.-x.  . ,Y.-y.  , 

l  i-k  i  l-k 


) 


C . ,  is  the  cosine  of  the  angle  between  the  vectors  a.,  and  6.,  so  that 
(~ik— anc*  <"ik=~^  ^or  a  strai?bt  line  and  +1  for  the  sharpest  angle 

of  0  degrees.  Thus,  the  local  maxima  of  C. ,  or  the  local  minima  of 

ik 

the  angle  will  correspond  to  the  points  of  maximum  curvature,  which 
will  serve  as  the  vertices  of  the  polygonal  approximation.  Although 
this  method  has  certain  shortcomings  [19],  but  it  works  well  if  the 
largest  k  is  smaller  than  the  distances  between  successive  angles 
along  the  curve.  Normally  smoothing  factor  k  is  taken  about  1/5  or 
1/10  of  the  perimeter. 


Features  Derived  From  Polygonal  Approximation 


Some  of  the  features  that  can  be  derived  from  polygonal 
approximation  of  the  boundaries  of  an  object  are- 


11  Length  of  a  side. 

2)  Intervertices  distance. 

3)  Slope  of  a  side. 

4)  Interior  angle  between  the  two  sides,  and 

5)  Angle  between  the  two  sides  as  shown  in  fig.  3.  This  angle 
corresponding  to  a  vertex  is  equal  to  the  angle  between  the  two  sides, 
where  one  side  is  obtained  by  joining  this  vertex  and  its  neighboring 
counter  clockwise  vertex  and  the  other  side  is  obtained  by  joining  the 
two  clockwise  neighboring  vertices  of  this  vertex.  This  angle  is 


named  as  exangle. 


When  scale  invariant  features  are  derived,  slope,  interior  and 
exangle  features  can  be  used.  For  rotation  invariance  length  of  a 
side  and  intervertices  distance  can  be  used.  For  rotation  as  well  as 
scale  invariance  interior  or  exangle  or  a  combination  of  angle  and 
length  features  can  be  used  in  the  initial  assignment  of 
probabilities. 


Initial  Assignment  of  Probabilities 


The  initial  assignment  of  probabilities  for  a  unit  is  obtained  by 
comparing  its  feature  values  with  the  feature  values  of  all  the  object 
segments.  Depending  upon  the  type  of  invariance  desired,  we  may  need 
the  weight  and  the  strength  of  a  particular  feature.  In  general,  the 
quality  of  correspondence  of  a  unit  i  to  an  object  segment  k  is  given 
by. 


M(Ti'0k) 


N1 

=  S  if 


nP'l  1  ^tni~f°n1 


W  S 
nl  nl 


where  Nl  is  the  total  number  of  features, 

f  =n,  feature  value  for  the  template  element, 
tn.  1 

f  ia*n,  feature  value  for  the  object  element, 
On.  1 

Wn  ^weight  for  the  feature  n^,  and 

Sn^ “strength  associated  with  the  feature  nj_. 


Note  that  for  a  perfect  match  M(T.,0,  )=1  and  for  a  poor  match  M(T.,0  ) 

i  k  ik 

will  be  very  small.  Thus  the  initial  probability  is  assigned  as, 


Pi(k)  = 


>M(Ti,Ok)  ' 


k=0 , 1 , . . . ,L-2 


These  probabilities  are  normalized  so  that  they  sum  to  1.  If  we  use 
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only  either  scale  invariance  or  rotation  invariance  or  both,  we  may 
not  need  weight  and  strength  factors  since  only  one  type  of  features 
(Nl=*l)  are  involved.  However,  if  we  combine  length  and  angles,  then 
we  need  the  strength  and  weight  to  account  for  their  importance  and 
the  different  range  of  values  of  features.  Thus  the  initial 
assignment  of  probabilities  involve  unary  relations. 

Computation  of  Compatibilities 

The  compatibility  function  determines  the  degree  by  which  two  or 
three  assignment  of  the  units  are  compatible  with  each  other.  There 
are  at  least  4  ways  of  computing  the  compatibilities  at  the  first  and 
second  stage  of  hierarchy. 

First  Method 


At  the  first. stage,  we  want  a  transformation  TR  from  unit  T^ 


to 


label  0^  ,  i.e.,  TR:  T^-*-  0^.  TR  consists  of  scale,  rotation  and 

translation  in  the  X  and  'i  directions.  This  transformation  is  applied 


to  the  unit 
computed  as, 


Ti 


and  the  error  between  the  transformed  T.  and  0,  is 

1  i 


M  (TR  (T 


NI 

-j)  >0  )  =  2Z  I  -f__  I  w  s 
J  *  n-j*!  nl  onl  ni  ni 


(12) 


where  ffc,n  3n^  feature  value  for  the  transformed  unit,  and  the  other 
quantities^are  similar  to  those  in  (11). 


Note  that  here  the  features  may  be  slope  and  length  of  the  side, 
so  we  shall  need  the  weight  and  strength  for  these  features,  if  the 
matching  error  is  based  on  these  features.  However,  it  is  possible  to 
avoid  these  parameters,  if  we  use  only  the  distance  between  the  ends 
of  0^  and  transformed  Tj  as  the  matching  error  between  two  segments. 
As  illustrated  in  fig.  4  in  this  case  matching  error 


M (TR (T  ) ,0  ) =AB+CD. 
3  t 


Now  the  compatibility 


CU,k,j,2.)  =  1+iM(TR  (T  )  n  ) 

J  ^ 


The  problem  with  this  method  of  computing  the  compatibilities  is  that 
the  compatibilities  are  not  symmetric  i.e.,  C ( i , k , j , i) ( j , i, i , k ) .  As 
we  have  seen  the  computations  of  gradient  requires  C(j,i,i,k),  so  if 
this  method  is  used  we  will  need  some  modification  in  our  program. 
Moreover,  since  we  are  using  only  one  transformation,  compatibilities 
values  will  not  be  very  accurate  compared  to  the  other  three  methods 
described  below. 


At  the  second  stage  to  compute  C ( i , k , i  , i  , i ^ , 4 ^ )  still  we  find 
one  transformation  as  described  above  and  use  it  to  compute  the  error 
between  the  transformed  i^  and  (Error  1}  and  transformed  i_,  and  C7 
(Error  2 ) .  Then 


C  (i, k,  i 2 ,  £- 2  ) 


_ 1 _ 

1+Error  1+Error  2 


Because  of  the  problems  of  asymmetry  and  inaccuracy,  this  method  is 
not  much  used. 


Second  Method 


Unlike  the  first  method,  here  we  find  two  transformations  TR1  and 
TR2  such  that 


and , 


TR1 :  T .  -  0, 

1  k. 

TR2:  T.  -  0, 

3  i 


Now  the  average  rotation,  average  scale  and  average  translation  (in 
the  X  and  Y  directions)  of  these  two  transformations  is  computed.  The 
transformation  associated  with  these  parameters  called  TV,  is  now 
applied  to  unit  i  and  unit  j  and  the  matching  errors  between  the 
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transformed  units  and  the  elements  0^  and  0 ^  ace  computed  as  in  the 
first  method  and  finally, 


C(i,k, j , 1)  = 


L+Total  Error 


At  the  second  stage  instead  of  finding  two  transformation,  we  find 
three  transformations  and  take  the  average  of  these  values.  This 
average  transformation  is  then  applied  to  units  i,  i^,  i ^  an<3  the 
total  error  between  the  transformed  units  and  object  elements  k,  £  , 
l0  is  computed  to  get  C(i,  k,  i  ,  £^,  i l^)  as  in  the  first  stage. 

Third  Method 


This  method  is  similar  to  the  second  method  in  that  we  compute 

two  transformations  TR1  and  TR2.  Now  TR1  is  applied  to  T_.  giving 

matching  error  M(TR1(T.),0  )  and  TR2  is  applied  to  T.  giving  matching 

j  £  i 

error  M (TR2 (T ) , 0^ ) .  Average  of  this  error  is  taken  and 


C  ( i ,  k ,  j  ,  L ) 


_ 1 _ 

1+Average  Error 


At  the  second  stage,  we  will  find  three  transformations  and  the 
average  error  will  be  the  average  of  six  error  terms  and  the 
compatibility 


i ,k , i. 


'*2!  = 


.+Average  Error 


Fourth  Method 

In  this  method  we  compute  mathematically  the  best  transformation 
from  units  i  and  j  such  that  the  sum  of  the  squares  of  the  error 
between  the  transformed  units  and  the  object  elements  is  minimum. 
Here  we  can  use  only  distance  for  the  computation  of  matching  error 
(unlike  the  first  three  methods  where  we  could  use  a  combination  of 
slope  and  length)  so  that  the  error  criterion  is  linear  and  linear 
least  squares  ideas  can  be  applied.  An  example  is  given  below.  Let 
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the  coordinates  of  segments  i,j,k,  and  i  be  given  by  (X2,Y_)), 

(DX  ,DY  ),  (DX2,DY2),  (R1,S1),  <R2,S2),  (U1,V1)  and  (U2,V2) 

respectively  then 

MX=b 

where 


and  \  is  a  scaling  factor,  d  is  the  rotation  and  X^  and  YQ  are  the 
translations  in  the  X  and  Y  directions  respectively.  The  above  set  of 
equations  is  an  overdetermined  system.  It  can  be  transformed  as 

MTMX=MTb 

and  now  it  will  be  4x4  system,  which  can  be  uniquely  solved  for  \, 

X  ,  Yg .  This  computed  transformation  can  then  be  applied  to  obtain 
compa tibi 1 i t i es  similar  to  the  second  method. 

At  the  second  stage  we  will  have  M  as  12x4  matrix  and  b  a  column 
vector  of  size  12x1.  It  can  be  solved  exactly  as  above  to  obtain  the 
compatibilities.  This  method  requires  more  time  for  the  computation 
of  compa t i bi 1 i t i es ,  than  any  of  the  other  methods.  Also  note  that  the 
second,  third  and  fourth  methods  of  compatibility  computation  lead  to 
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symmetric  compatibilities. 

Initial  Probabi 1 i ty  and  Compatibi 1 i ty  for  the  NIL  Class 

The  assignment  of  initial  probability  and  compatibility  to  the 

nil  class  is  very  important,  since  for  some  of  the  units  there  may  not 

be  any  object  element.  Note  that  0  is  the  nil  class. 

Li""  -L 

Assignment  of  Initial  P robabi 1 i ty  to  the  Nil  Class  (nil) 

There  are  at  least  three  ways  for  the  initial  assignment  of  the 
probability  to  the  nil  class. 

1.  All  the  classes  are  assumed  to  be  equally  likely,  then 
Pi  (nil)  =  pi(0)  =  Pi (1) . . .  =  pt (L- 2 )  =  1/L ; 


where  L  is  the  total  number  of  classes.  However,  it  is  not  a  good 
assignment  since  the  results  of  relaxation  scheme  depends  on  the 
initial  assignment  and  it  is  better  to  use  feature  information  for 
classes  (other  than  nil  class)  rather  than  using  no  information. 

2.  p^(nil)  is  assigned  a  small  constant  value,  depending  upon  the  a 
priori  information  that  we  may  have  about  the  possible  number  of 
matches.  Normally,  we  have  taken  p^(nil)  between  0.05  to  0.25.  The 
actual  value  is  not  critical,  however,  it  affects  the  convergence  of 
probabilities,  hence  the  number  of  iterations  required  to  achieve  the 
desired  result. 

3.  If  the  number  of  units  is  not  very  large,  then  p^(nil)  can  be 
taken  as 

o . (nil)  =  1  -  max  p.(k),  k  =  0 , 1 , . . . ,L-2 

“ 1  k  L 

After  the  assignment  of  probability  to  the  nil  class  in  the  above  two 
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cases  probabilities  are  again  normalized  so  that  they  sum  to  1  for  L 
classes . 


Assignment  of  Compatibilities  Involving  Nil  Class 

At  the  first  stage  of  hierarchy  the  compatibility  is  C  ( i  ,k,  j  ,2, )  . 
Compatibility  involving  nil  class  is  assigned  as  follows.  (Note  that 
CL  .  is  the  nil  class)  . 


C  (i,k, j ,nil)  =  pi(k) 

C  (  i  ,  ni 1 , j ,  2)  =small  number  usually  between  0.05  and  0.25  where,  i 
varies  over  all  classes. 

At  the  second  stage  of  hierarchy  the  compatibility  is  given  by 
C(i,k,i^,  i 2#  Z2)  .  Compatibilities  involving  nil  classes  are  assigned 
as  follows. 

C(i,k,i1,nil,i2,Z2)  =  C(i,k,i2,22) 

C(i,k,i1,z1,i2,nil)  =  C (i,k,  ,  J^) 

C (i ,k, ,nil, i2 ,nil)  =  p  ^  ( k ) 

C  (i ,nil , , i2 , Z2)  =  small  no.  usually  between  0.05  and  0.25 


where  l-L  and  22  vary  over  all  classes. 

Strategies  That  Lead  to  Faster  Computation 

Following  are  some  of  the  strategies  that  have  been  used  to 
obtain  faster  convergence  of  the  probabilities. 

1)  We  can  threshold  a  probability  value  to  zero  if  it  is  less  than  a 
certain  threshold  such  as  10~^.  Once  one  or  more  of  the  components  of 


p  become  zero,  we  don't  compute  the  gradient  and  compatibility  for 
them  and  suitably  take  care  of  it  in  the  computation  of  the  projection 
of  the  gradient. 

2)  We  can  threshold  a  probability  vector  as  the  unit  vector  if  any 
of  the  components  of  p^  becomes  greater  than  a  certain  threshold,  say 
75%. 

3)  Compute  QMk),  compatibility  and  gradient  at  the  first  and  second 
stage  only  for  a  limited  number  of  most  likely  labels.  Usually  this 
number  has  been  taken  to  be  1  or  2. 

All  of  these  features  have  been  incorporated  in  the  program  and 
lead  to  a  marked  reduction  in  the  computation  time.  In  the  next 
section  we  present  an  example  which  illustrates  the  capabilities  of 
hierarchical  gradient  relaxation  method. 


An  Example 

Figs.  5  and  6  show  a  template  and  an  object  respectively.  The 
perimeter  of  the  template  consists  of  21  points  and  that  of  the  object 
38  points.  Various  features  of  the  template  and  object  are  shown  in 
these  figures.  V  and  V  correspond  to  the  X  and  Y  coordinates  of  the 
vertices.  Polygonal  approximation  is  obtained  with  a  smoothing  factor 
of  4.  It  is  to  be  noted  that  the  upper  portion  of  the  template  is  a 
noisy  version  of  the  object  so  the  polygonal  approximation  in  the  two 
cases  are  different.  This  makes  the  matching  problem  little  more 
complicated  than  in  [1]  where  Davis  introduces  the  noise  after  the 
polygonal  approximation,  so  that  the  number  of  edges  and  vertices 
remain  the  same. 


Table  1  and  2  show  the  results  for  the  two  levels  of  hierarchy  (8 
first  stage  iterations  followed  by  8  second  stage  iterations)  at 
various  iterations  and  table  3  shows  the  expected  assignments  for  the 
units  of  the  template.  Initial  probabilities  have  been  computed  using 
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Table  1.  Reduced  compatibility  and  gradient  computation. 


Smoothing  factor  =  4,  a  =  0.99,  nil  class  value  =  0.15 
Type  of  compatibility  computation  =  3 


only  tnc  angle  between  the  two  sides  (i.e.  interior  angle)  and 

compatibilities  have  been  obtained  by  finding  the  average  error  (third 

method) .  In  obtaining  Table  1  we  have  considered  only  one  most  likely 

name  in  computing  compatibilities  and  gradient,  whereas  in  Table  2, 

all  possible  class  names  have  been  considered.  Computation  time  for 

Table  1  is  about  1/20  of  that  for  Table  2.  About  70%  of  the  total 

computation  time  is  spent  in  computing  the  gradient.  Also  since  we 

don't  store  the  compatibility  values  while  computing  the  consistency 

vector,  we  compute  them  again  when  gradient  is  needed.  Assignment  in 

both  cases  (Tables  1,  2)  are  very  reasonable  although  we  did  not 

converge  to  the  same  assignment.  If  we  increase  the  number  of 

iterations  for  the  first  stage  in  Tables  1  and  2,  then  corresponding 

to  Table  1  at  the  15th  iteration  of  the  first  stage,  the  assignment  of 

the  units  is  1,  2,  7,  8,  9,  10  and  corresponding  to  Table  2  it  is  1, 

2,  2,  8,  9,  10.  This  shows  the  need  for  the  second  stage  which  gives 

a  better  labeling  of  units  when  it  is  followed  by  the  first  stage. 

Also  we  have  found  that  generally  the  sum  of  the  iterations  at  the  1st 

and  2nd  stage  is  usually  less  than  needed  by  the  1st  stage  alone  to 

obtain  approximate  labeling  of  units.  In  obtaining  these  tables  we 

have  not  thresholded  the  upper  values  of  the  probabilities  (the  lower 

-4 

values  have  been  thresholded  at  10  ).  However,  if  we  assign  a  unit 
vector  to  a  unit  when  its  probability  for  a  particular  class  gets 
higher  than  a  threshold,  say  60%,  then  the  number  of  iterations  at 
both  stages  reduces. 

Conclusion 

We  have  shown  how  a  hierarchical  gradient  relaxation  method  can 
be  successfully  used  for  the  shape  matching  of  objects.  In  this  study 
although  we  have  considered  only  two  levels  of  hierarchy,  yet  the 
algorithm  easily  generalizes  to  higher  levels  of  hierarchy.  With  the 
increase  in  levels  of  hierarchy,  we  are  using  more  world  knowledge,  so 
the  complexity  of  the  processing  increases,  but  the  reliability  of  the 
assignment  of  units  increases.  First  stage  alone  does  not  resolve  all 
the  ambiguities  of  labeling  even  if  the  number  of  iterations  is  very 


large.  Second  stage  helps  in  resolving  those  ambiguous  labelings. 
Normally  we  have  found  that  about  L/2  number  of  iterations  at  the 
first  stage  followed  by  an  equal  number  of  iterations  at  the  second 
stage  produce  reasonable  matches  when  L  <  20.  Although  the  chain  code 
cross-correlations  and  feature  based  approaches  are  simpler  and 
computationally  less  costly,  yet  they  are  not  as  robust  and  flexible 
as  this  method  is.  Also  this  method  does  not  require  that  the  number 
of  template  elements  be  less  than  the  number  of  object  elements  as  it 
is  needed  in  [ 1] . 
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2.8  Shape  Description  of  Occluded  Objects  Using  Coordinated 

Hierarchical  Gradient  Relaxation  Method 

B.  Bhanu  and  O.D.  Faugeras 

Introduction 

Matching  of  occluded  objects  is  one  of  the  prime  capabilities  of 
any  shape  analysis  system.  Particularly  in  the  analysis  of  sequence 
of  images  occlusion  problem  becomes  of  major  importance  in  the  task  of 
modeling  a  dynamic  environment  [1-3].  Aggarwal  and  Coworkers  [4-6] 
studied  the  problem  of  modeling  the  image  of  a  partially  occluded 
object  and  use  the  derived  model  to  track  the  object  through  partial 
or  even  complete  occlusion.  These  authors  work  on  simulated  binary 
images,  processing  by  systematically  relaxing  the  constraints  on 
object  contours.  Aggarwal  and  Duda  [4]  in  their  study  on  tracking 
clouds  that  partially  occlude  each  other  take  the  polygonal 
approximation  of  the  shape  of  objects.  These  polygons  are  assumed  to 


113 


be  rigid,  however,  holes  in  the  polygons  are  allowed.  The  concept  o£ 
a  true  vertex  and  a  false  vertex  is  used  to  detect  occluding  parts. 
Their  matching  approach  is  sequential  and  suboptimal.  Chow  and 
Aggarwal  [5]  consider  planar  curvilinear  objects  having  no  holes. 
They  also  assume  that  all  the  objects  are  known  to  the  program  before 
any  analysis  is  done.  Their  basic  concept  of  matching  is  the  same  as 
of  Aggarwal  and  Duda  [4].  They  mention  that  their  matching  technique 
can  be  extended  by  matching  the  boundary  of  the  images,  however, 
approaches  such  as  chain  code  cross-correlation  etc.  are  not  suitable 
especially  if  we  are  dealing  with  real  images  [7].  Martin  and 
Aggarwal  (6]  represent  the  shape  of  an  object  by  a  sequence  of 
straight  lines  which  have  been  derived  from  the  graph  of  total 
subtended  angle  vs  arc  length,  i|/-s,  function  of  the  object.  Their 
line-fitting  process  is  the  technique  of  iteration  end-point  fit 
[p.  338,  8]  and  their  segment  matching  approach  is  heuristic  and 
cumbersome.  They  consider  the  tracking  of  individual  boundary 
segments  which  allows  to  decompose  the  contour  of  two  overlapping 
object  images  during  the  tracking  process  into  boundary  parts  related 
to  the  component  objects.  Yachida  et  al.  [3]  apply  the  ideas  similar 
to  Chow  and  Aggarwal  [5]  to  the  natural  images  involving  the  movement 
of  fishes  swimming  in  a  vat.  For  occlusion  problem,  they  consider  the 
boundary  matching  in  a  relatively  simple  fashion.  Roach  and  Aggarwal 
[9]  consider  the  occlusion  problem  in  a  3-D  blocks  world. 

We  view  the  occlusion  problem,  basically  a  segment  matching 
problem  which  involves  matching  the  segments  of  two  or  more  actual 
objects  with  the  apparent  object,  which  is  formed  by  the  occlusion  of 
these  objects.  Of  course,  some  segments  of  the  actual  objects  may  not 
match  with  any  of  the  segments  of  the  apparent  object.  Also  matching 
algorithm  should  not  assign  the  same  segment  of  the  apparent  object  to 
the  segments  of  different  actual  objects  which  are  occluding.  Once 
the  matching  of  actual  objects  with  the  apparent  object  has  been  done, 
it  will  be  a  relatively  simple  matter  to  track  them  and  carry  out  the 
motion  analysis  [15].  The  segment  matching  algorithm  that  we  use  is 
based  on  the  hierarchical  gradient  relaxation  method  [1C].  It  is  an 
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iterative  algorithm  based  on  optimization  approach  for  labeling  a  set 
of  interrelated  units.  In  order  to  use  this  algorithm  for  the  shape 
description  of  occluded  objects,  we  modify  this  algorithm  using  a 
penalty  function  approach  so  that  the  same  segment  of  the  apparent 
object  is  not  assigned  to  the  segments  of  different  actual  objects. 

The  class  of  shapes  that  we  consider  are  represented  by  simple 
closed  curves  (no  holes}  and  are  two  dimensional  in  nature.  These 
shapes  will  be  approximated  by  polygons.  Their  vertices  being  the 
points  of  high  curvature  [11].  We  allow  objects  to  change  in  shape, 
but  such  changes  should  not  be  drastic  otherwise  no  matching  algorithm 
can  work.  Unlike  the  previous  studies  as  mentioned  above  we  also 
allow  a  change  in  scale  so  the  objects  in  general  may  move,  rotate, 
and  their  scale  may  change.  If  the  objects  are  rigid,  then  matching 
will  be  a  relatively  easy  task  for  our  algoritnm.  In  this  paper  we 
describe  the  algorithm  and  study  the  computational  aspects  associated 
with  the  hierarchical  gradient  relaxation  and  penalty  function 
approach.  Finally,  the  results  are  illustrated  with  the  aid  of  an 
example  in  which  two  actual  objects  occlude  to  form  an  apparent 
object . 

Problem  Formulation 


Consider  a  general  case  in  which  M  (>_  2)  actual  objects  occlude 
one  another  to  form  a  single  apparent  object.  In  the  following  we 
shall  call  the  actual  objects  as  templates  and  the  apparent  object  as 
object.  Let  a  template  X  be  represented  by  X* (T. ,Tn , . . . , TM ) ,  where  N 
is  the  number  of  segments  in  the  polygonal  path  representation  of  the 
template.  Similarly,  let  0“  (0^  ,  Oj  ,  .  .  •  ,  0^^  )  be  the  polygonal  path 
representation  for  the  object.  Object  has  L-l  segments. 
Conventionally,  a  polygon  will  be  traced  in  the  clockwise  sense, 
i.e.  keep  the  interior  to  the  right.  We  want  to  match  the  segments  of 
the  templates  against  the  segments  of  the  object  such  that  the 
following  two  conditions  are  satisfied. 
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1)  Any  segment  of  two  different  templates  should  not  be  assigned  to 
the  same  segment  of  the  apparent  object. 

2)  One  or  more  segments  of  the  templates  which  do  not  correspond  to 
any  of  the  segments  of  the  apparent  object  should  be  assigned  to  the 
nil  class,  i.e.,  no  match  class. 

We  shall  call  the  object  segments  as  classes,  and  the  template 
segments  as  units.  Let  the  nil  class  be  denoted  by  0  .  To  each  of 

the  units  ,  we  assign  a  probability  denoted  by  p^  (k)  to  belong  to 

class  0  .  This  is  conveniently  represented  as  a  vector 

^  iC  m  ^ 

[p^ ( 1) / • • • ,p^ (L) ] .  The  set  of  all  vectors  p^  (i=l,...,N)  is  called 
a  probabilistic  or  stochastic  labeling  of  the  set  of  units.  Units  are 
related  to  one  another,  set  of  units  related  to  T^  is  denoted  by  VL  . 
The  units  that  are  related  to  T^  are  T^_^  and  T^+^,  where  the  indices 
are  taken  as  modulo  N.  At  the  first  stage  of  hierarchy  T^_^  and 
will  be  called  as  the  left  and  right  neighbors  of  the  unit  T^ .  At  the 
second  stage  of  hierarchy  ,  and  T.  ^  will  be  considered  as  an 

entity  in  itself.  The  world  model  is  specified  by  the  compatibility 

functions  C,  which  in  general  is  defined  only  over  a  subset 

2  3 

S1  £  (NxL)  for  the  first  stage  and  S2  c  (NxL)  for  the  second  stage 

of  hierarchy.  At  the  first  stage  of  hierarchy  C(T,  ,  0,  ,  T.,  0.) 

1  K  3  c 

measures  the  adequacy  of  calling  unit  T-  as  0V  and  unit  T-  as  0„  where 

Tj  e  Vi  (=T^_^  or  Similarly  at  the  second  stage  of  hierarchy 

C (T^ ,  0k,  T^_^,  0^  ,  T^+^,  0,  )  measures  the  adequacy  of  calling  unit 

T.  as  0,  ,  unit  T.  .^as  C  and^unit  T.  .  as  0,  .  For  each  of  the 

units  we  also  define  a  consistency  vector  q^=[q^(l),  q^(2),  q  ^  C  L ) ] i 

that  tells  us  what  fT  should  be  given  p_.  at  the  neighboring  related 

units  and  the  compatibility  function.  For  simplicity  in  the  sequel  we 

shall  denote  C(T.  ,0,  ,  T,  ,  0  )  as  C(i,k,j,£)  and  C(T.  ,  0  ,  T  .  0  , 

lkji  1  k  i-l  1, 

T.  0  )  as  C  ( i ,  k ,  i  ,  j  ,  i  ,  4,).  1 

1  •  C  2  «  X  fa  to 

As  described  in  [10],  we  define 
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(1) 


where , 


(50 


Qi(k) 


L 

H  Q<U) 
2=1  1 


Q±  (k) 


L 

53  53  ~(i,k,j,2)p  (2) 

:eVi  2=1  3 


(2) 


at  the  first  stage  of  hierarchy.  Similarly  for  the  second  stage  of 
hierarchy  we  obtain, 

L  L 

Qi(k)  =  53  53  C(i,k,i  ,2  i  ,2-)p.  (2.)p.  (2-)  (3> 

2^*1  22=1  1  1  *  ^  11  1  12  2 

with  q^k)  defined  by  (1). 

The  global  criterion  that  measures  the  consistency  and  ambiguity 
of  the  labeling  over  the  set  of  units  of  template  X  is  given  by 


c 


N 

£  *i-5i 


(4) 


let  v  be  the  vector  of  Rp=RLx  . .  .  xRL  (P=NL)  equal  to  (p^,  {S'  ,  .  . .  ,  {J  )  . 

Then  (4)  can  be  written  as 


117 


N 

C  =  2  j,  (V) 

i=l  1 


(5) 


where 


J .  (v)  =  p.  -q. 

1  r  i  n  i 

Now  the  total  criterion  of  consistency  and  ambiguity  for  all  the 
templates  will  be  given  by. 


M  Nm 

M  m 


F<5:1'72 . V  -  E  E  J; 

m=l  i=l  1 


(v  ) 
m 


(6) 


where  N  is  the  number  of  seaments  and  v  is  the  P  (=N  L)  dimensional 
m  m  m  m 

probability  vector  associated  with  the  mth  template  Xm« 

The  condition  that  any  segment  of  two  different  templates  should 
not  be  assigned  to  the  same  segment  of  the  object  can  be  written  as, 


M 

g(V"2 . v  =  S 

i=l 

where  s  is  obtained  from  v  with  the  elements  corresponding  to  the 

**  X, 

nil  class  set  equal  to  zero  for  all  the  units  of  the  template  X  t  and 
g(s;j_,Sj)  is  the  inner  product  of  every  vector  p  x  of  with  every 

vector  Pj  of  Sj.  What  this  condition  essentially  means  is  that  if  a 
unit  i  of  a  template  Xm  belongs  to  the  class  0  (where  2ML) ,  then  sum 
of  the  inner  product  of  the  probability  vector  of  this  unit  p^mwith 
the  probability  vectors  of  all  the  units  of  all  the  templates  should 
be  zero.  The  nil  class  components  have  been  excluded  in  obtaining  s 
from  v^,  because  one  or  more  segments  of  different  templates  may 


M 

E 

j  =  i+l 


g(s-  ,s  .)  =  0 


(7) 
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belong  to  the  nil  class.  Let  us  consider  an  example,  when  there  ace 
two  templates,  i.e.,  M=2.  Also  let  the  number  of  segments  in  template 
X  ^  and  X2  be  3  and  4  respectively,  then 

G(^l'^2>  =  <?(s1,s2)  =  0  (8) 


3  4 

g(ll,s2)  =  (2  Pii^  <£  Pi2} 

=  (Pn+^l^Sl5  ‘  (Pl2+^22+P32+P42) 

=  Pll* (Pl2+P22+P32+P42)+P2l' (Pl2+P22+P32+P42)+ 
P31(P12+P22+P32+P42) 

where 


Pirn’  ‘■W1'-?!,,,'2’' 


.p^lL-D.Ol 


Now  the  segment  matching  problem  can  be  stated  as  follows. 

P roblem  S tatement  (A) 

Given  an  initial  labelings  vj0^,  v , . . . , v^0^  for  the  set  of  M 
templates  to  belong  to  various  segments  of  the  object,  find  the 
labeling  u^,  u2,...,uM  that  correspond  to  the  local  maxima  of 

criterion  (6)  which  is  closest  to  v|0),  v 2 0 ) ,  .  . . , v^° )  subject  to  xhe 
following  constraints. 

(a)  If  u  -(p.  ,p,  )  then  p  is  a  probability  vector  for 

m  lm  2m  Nm  tm 

j* 1,2, ...,N  and  m=l,2,...,M.  For  a  particular  unit  y  of  the  template 

X  ,  this  means  that  the  sum  of  the  components  of  the  vector  p  be 
m  i 111 
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equal  to  unity  and  that  each  component  be  greater  or  equal  or  zero, 
i . e. ,  if 


P  =  CP  (1) 
ym  ym 


P„(2) 

ym 


’Pvm(L) ] 
ym 


then 


E 


£=1 


P 


ym 


U) 


1 


and 


p  (Ji)  >  0  for  >1  =  1,  .  .  .  ,  L 

'ym  — 

(b)  G (v  , v^ , . . . , v^)  as  defined  by  (7)  be  equal  to  zero. 

Note  that  the  criterion  (6}  is  nonlinear.  Constraint  (a) 
involves  linear  equality  and  nonnegativity  restriction  and  constraint 
(b)  is  nonlinear.  In  order  to  solve  this  optimization  problem  we  use 
the  penalty  function  concept  [12-14]  and  modify  the  hierarchical  shape 
matching  gradient  relaxation  technique  [10]. 

Coordinated  Hierarchical  Relaxation  Technique  Based  On  Projection 
Gradient  Method  and  Penalty  Function  Approach 

In  order  to  solve  the  above  problem  using  the  penalty  function 
approach,  we  define  the  penalized  objective  function  [12-14]  as. 


*C(V1'V2' 


■V  = 


F  (v. 


'V 


H  X!  (9  (S  .  ,S  )  ) 

i=l  j-i  +  1  1:1  1]  1  3 


(10) 


where  ^  is  a  penalty  function  and  { d ^ j }  are  penalty  constants,  also 
referred  as  the  coordination  factors.  Since  the  constraint  (b)  given 
by  (7)  is  an  equality  constraint,  the  penalty  function  is  taken  as  the 


120 


simple  quadratic  loss  function  given  by 

(a)  4  -a2  (ii) 

Now  the  problem  described  in  the  earlier  section  becomes 
equivalent  to  that  of  maximizing  (10)  subject  to  the  constraint  (a). 
It  can  be  solved  using  the  gradient  projection  method  applied  to  the 
linear  constraints.  We  have  used  this  method  in  the  hierarchical 
shape  matching  algorithm  described  in  [10).  We  modify  this  algorithm 
with  respect  to  the  penalized  objective  function.  Maximization  of 
(10)  subject  to  the  constraint  (a)  is  equivalent  to  maximizing 


max  F  (v1)+s  (Vj_,  .  .  .  ,vM) 


max  F(v2)+S(v1 , . . . ,v  ) 
v2 


roax  F(v  )+s(v  ,  .  .  .  ,v  ) 
VM 


(12) 


where  S(v^,...fv  )  corresponds  to  the  2nd  term  of  (10).  The  algorithm 
has  been  implemented  in  the  serial  fashion  on  the  computer,  first  we 
maximize  with  respect  to  v^,  then  with  respect  to  v^  and  so  on.  The 
main  modification  of  the  algorithm  [10]  with  respect  to  this  problem 
is  the  computation  of  the  gradient.  Earlier  in  [10]  we  maximized  F(v) 
with  respect  to  v,  but  now  the  contribution  to  the  gradient  also  comes 
from  the  second  term  in  (12) ,  for  example, 


new  gradient  with  respect  to 


old  gradient  with  respect  to  v. 
%  X 

+  5^  [S(V^2 . V] 


As  an  example  let  M=2,  then  the  problem  becomes  to  maximize 


*C(VV  =  F(v1,v'2)-d(c(si,s2)  2 
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subject  to  the  constraints  (a).'  is  taken  as  d  in  the  above 

equation.  In  order  to  solve  this  we  consider 

max  F (v^) -d [g ( , S2) ] ^ 

^  ^  2 

max  F  ( V2 )  [g  ( /  S2 )  ] 

L  ^2 

and  use  the  method  described  in  [10]  with  the  modification  for  the 
gradient  term.  Similarly,  when  M*3,  we  solve  the  following, 

let  A  =  dl2 [g (s1,s2) ] 2+dl3 [g (s: , s3) ] 2+d23 [g  (s2,s3) ] 2 
max  F(v^)-A 

max  F  (v2 ) -A 
^2 

max  F(v3)-A 
v3 

In  general  to  solve  (12)  by  maximizing  with  respect  to  v^  the 
algorithm  can  be  stated  as  follows: 

The  Occlusion  Algorithm 

1.  Pick  an  initial  estimate  of  ( v|° ^  , v^0 ^  , • • • , ^  )  •  This  is  the 
initial  assignment  of  probabilities  to  the  units  of  the  templates. 

2.  Pick  the  penalty  constant  { ^ }  so  that  it  provides  a  suitable 
balance  between  the  associated  first  and  second  terms  of  (12). 

3.  Determine  the  maximum  v^n+^  (m=l ,  2, .  .  .  ,M)  of  the  unconstrained 
penalized  objective  function  (12)  subject  to  the  constraints  (a)  by 
using  the  present  iterate  v^n)  and  projection  gradient  method  [10]. 

Pick  new  penalty  constants  [d^j]  in  order  to  modify  (or  rebalance) 


4. 


the  magnitude  of  the  penalty  terms.  The  magnitude  of  the  penalties 
should  be  increased  to  force  a  closer  approach  to  the  boundary; 
replace  n+1  by  n  and  return  to  3. 

Under  the  assumption  of  continuity  of  function  F  (6)  and 
constraints  (7)  inherent  in  (10)  the  sequence  of  maxima  £or 
m»l,...,M  generated  by  the  above  algorithm  approaches  a  constrained 
maxima  of  the  problem  defined  in  (A).  The  iteration  is  terminated 
after  the  convergence  is  considered  as  adequate.  Details  of  these 
numerical  strategies  have  been  described  in  [10]  and  the  rate  of 
convergence  and  the  possible  ill-conditioning  near  the  boundary  have 
been  discussed  in  [12-14].  However,  in  our  case  since  we  are  seeking 
only  local  maximas,  these  problems  do  not  occur.  In  the  next  section 
we  present  an  example  of  the  segment  matching  where  two  actual  objects 
occlude  each  other  to  form  an  apparent  object.  Block  diagram  of  the 
algorithm  is  shown  in  fig.  1. 


Example 


Figure  2  shows  two  actual  objects  which  occlude  each  other  to 
form  an  apparent  object  shown  in  fig.  3,  Note  that  there  is  a  change 
of  scale  for  actual  object  X ;  it  may  result  because  of  segmentation 
(different  lighting  conditions)  or  changes  in  the  object  itself  even 
if  the  camera  position  remains  fixed.  In  figs.  2  and  3  dotted  points 
show  the  boundary  of  the  objects  and  polygonal  approximation  has  been 
obtained  with  a  smoothing  factor  of  4.  Table  1  shows  the  parameters 
used  in  the  hierarchical  gradient  relaxation  algorithm  [10]  and  tables 
2  and  3  show  the  results  when  only  first  stage  is  used.  An  inspection 
of  figs.  2  and  3  gives  the  most  logical  assignment  (see  Table  4)  for 
the  units  of  X,^  and  X2>  Note  that  for  X1  assignments  are  good,  but 
for  X^ ,  the  assignment  of  unit  1  is  wrong.  It  should  be  19.  Tables  5 
and  6  show  the  results  when  the  first  stage  with  six  iterations  is 
followed  by  the  second  stage.  Again,  in  this  case  assignments  for  X^ 
are  valid,  whereas  for  X^  assignment  for  unit  1  is  wrong  (Table  6). 
Although  this  assignment  is  correct  at  the  second,  third  iterations  of 
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Figure  2.  (a)  First  actual  object  X^,  perimeter  =  34  points 

(b)  Second  actual  object  x2 ,  perimter  =  35  points 
In  each  of  the  figures  the  number  of  vertices  =  9  and 
dotted  points  show  the  boundary  of  objects.  Polygonal 
approximation  is  obtained  with  a  smoothing  factor  of  4 


Apparent  object  formed  as  a  result  of  occlusion 
actual  objects  X,  and  shown  in  fig.  2. 
Perimeter  =  67  points.  Number  of  vertices  =  18 
Dotted  points  show  the  boundary  of  the  object. 
Polygonal  approximation  is  obtained  with  a 
smoothing  factor  of  4. 


Table  1 


Parameters  used  in  the  Hierarchical  Relaxation  Algorithm. 


Smoothing  factor  in  the  Polygonal  Approximation  =  4 
Weight  of  angle  =  1 
Weight  of  length  =  2 

Strength  of  angle  =  strength  of  length  =  1 

Initial  probability  assignment  -  slope  and  length 

of  a  segment  is  used 

Nil  class  value  for  compatibilities  and  initial 
probability  =  0.15 

Compatibility  computation  -  Average  distance  error  method 

Reduced  compatibility  and  gradient  computation  - 

Number  of  likely  labels  at  the  first  and  second 
stage  =  1 

Value  of  a  in  the  iteration  equation  =  0.99 

Upper  threshold  value  for  probabilities  =  0.80 

-4 

Lower  threshold  value  for  probabilities  =  10 


12 


Table  2.  Assignment  of  units  of  actual  object  X,  at  various  iterations  of  the 
first  stage  of  Hierarchical  Relaxation. 
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able  3.  Assignment  of  units  of  actual  object  X2  at  various  iterations  of  the 
first  stage  of  Hierarchical  Relaxation. 


Table  5.  Assignment  of  units  of  actual  object  at  various  iterations  of  the 
first  and  second  stage  of  Hierarchical  Relaxation. 


the  second  stage,  but  it  is  labeled  as  2  at  iteration  6  because  unit  9 
is  labeled  as  19  at  iterations  3  which  causes  a  change  in  the  labeling 
of  unit  1.  One  other  fact  that  can  be  noted  from  careful  examination 
of  tables  2-6  is  tha..  not  only  the  number  of  iterations  is  little  less 
when  the  second  stage  is  followed  by  the  first  stage,  but  also  it 
gives  better  labeling  of  units  compared  to  the  case  when  only  first 
stage  is  used. 

Finally,  Tables  7  and  8  show  the  result  of  the  occlusion 
algorithm.  We  have  shown  the  values  of  the  unconstrained  objective 
function  (first  term  of  (10))  and  penalty  function  term  (second  term 
of  (10))  and  the  penalty  constant  at  various  iterations.  Penalty 
constant  is  chosen  such  that  it  provides  a  balance  between  these  two 
terms.  For  both  objects  X^  and  assignments  are  valid,  however,  the 
convergence  rate  is  little  slower  than  in  tables  5  and  6. 

For  the  unit  1  of  the  assignment  does  not  change  from  19,  once 
it  has  been  assigned  (see  table  6  and  8)  and  it  has  been  found  that 
the  probability  of  this  assignments  increases  with  the  iterations. 
Thus  label  2  is  uniquely  assigned  to  the  unit  2  of  X^.  Also  it  is 
interesting  to  note  the  assignment  of  unit  2  of  X^  in  tables  6  and  8. 

Conclusion 


In  this  paper  we  have  investigated  the  problem  of  shape 
description  of  occluded  objects  based  on  segment  matching.  The 
matching  has  been  done  using  an  earlier  shape  matching  algorithm  [10] 
which  has  been  modified  to  include  the  conditions  for  successful 
segment  matching  when  objects  occlude.  The  algorithm  based  on  a 
hierarchical  gradient  projection  method  and  penalty  function  concept 
is  described  and  an  example  has  been  presented  to  illustrate  the 
results.  We  hope  that  this  framework  provides  a  mathematical  basis 
for  the  solution  of  occlusion  problem. 
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Table  7.  Assignment  of  units  of  actual  object  X.  at  various  iterations  of  the  Occlusion  Algorithm. 
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Assignment  of  units  of  actual  object  X2  at  various  iterations  of  the  Occlusion  Algorithm. 


Penalty  constant 
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2.9  Computation  of  Features  in  the  Analysis  of  Images  of 
Moving  Objects 

B.  Bhanu 


Introduction 


Normally  the  input  for  the  human  vision  system  is  a  complex  scene 
containing  many  objects  with  irregular  shapes  and  the  scene  is 
changing  due  to  the  relative  motion  between  the  observer  and  the 
object.  Different  densities  of  receptors  in  the  eye  cause  different 
actions  by  parts  of  the  retina.  Peripheral  vision  processes  detect 
the  motion  and  provide  gross  information  about  the  scene.  Detailed 
fine  analysis  is  done  by  the  attentive  processes,  wherever  it  is 
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over 


process  of 
distinctive 


necessary  [1J.  Perception  built  over  time,  is  the 
detecting  the  invariants  of  an  object  and  finding 
differences.  In  other  words,  the  perception  of  the  scene  is  the 
result  of  integrating  several  input  frames.  Similar  to  a  human  vision 
system,  a  computer  vision  system  capable  of  analyzing  a  sequence  of 
frames  should  be  able  to  extract  information  not  only  from  each  frame 
but  from  the  sequence  as  much  so  that  a  description  of  the  sequence 
can  be  obtained.  Also  the  amount  and  type  of  processing  should  vary 
with  the  complexity  at  different  places  of  the  scene. 


Movement  is  a  form  of  change,  and  change  is  a  significant 
perceptual  attribute  of  our  world.  The  basis  of  perceived  change  in 
human  being  lies  entirely  in  the  functioning  of  the  'nervous  system. 
Change  is  a  property  which  involves  a  comparison  between  what  the 
object  was  and  what  the  same  object  is  now.  Changes  can  be  perceived 
in  the  location  of  an  object,  its  structure,  size,  shape,  color  etc. 
[2],  Usually  we  can  readily  identify  the  nature  of  the  perceived 
change.  Similarly,  a  machine  vision  system  for  analyzing  a  sequence 
of  images  should  detect  significant  changes  in  the  features  which 
distinguish  the  object  from  other  similar  objects.  Continuing  our 
work  on  the  analysis  of  moving  images  [3],  in  this  paper  we  describe  a 
set  of  features  which  can  be  used  in  the  analysis  of  such  images  and 
show  the  results  on  a  test  image. 


Dynamic  Image  Understanding  Model 

The  key  problem  in  artificial  intelligence  is  to  automatically 
discover  good  distinguishing  feature  representations  for  objects  based 
on  general  world  knowledge.  In  order  to  address  this  problem  and 
carry  out  the  motion  and  shape  analysis  of  objects  in  an  adaptive 
manner  we  consider  a  new  dynamic  image  understanding  model  reported 
earlier  [3].  Its  simplified  block  diagram  is  shown  in  Fig.  1.  Model 
achieves  the  symbols,  features  and  segmentation  control  by  distributed 
feedback.  Depending  upon  the  matching  of  scene  at  a  time  with  its 
prediction  model,  we  do  simple  or  complicated  processing  for  motion  as 
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well  as  shape  analysis.  The  prediction  of  later  scenes 
processing  subsequently.  The  features  computed  in 
fashion  are  used  in  a  hierarchy  of  matching  algorithms, 
segmentation  varies  with  the  dynamics  of  the  scene, 
consider  the  meaning  of  the  feedback  conditions  A,  B,  C, 
Fig.  1. 


should  reduce 
a  controlled 
Note  that  the 
Now  let  us 
D,  E  shown  in 


1)  Want 
matching 


to  compute  more  features  for 
algorithm  (A  or  B) . 


matching 


or  use  a  different 


2)  Want  a  better  matching  algorithm  or  a  better  segmentation  (B  or 
C)  . 

3)  When  the  matching  of  a  scene  with  its  prediction  model  is 
successful,  then  the  prediction  of  the  next  scene  will  be  used  to 
carry  out  the  segmentation  (E) . 


4)  More  symbols/features  are  compurad,  but  it  is  concluded  that  the 
incorporation  of  them  in  the  matching  algorithm  will  not  lead  to 
successful  matching,  then  we  want  to  go  for  better  segmentation  (D) . 


Work  along  the  above  lines  is  presently  under  investigation  and 
will  be  reported  in  the  future.  In  the  next  section  we  describe 
various  features  which  should  be  useful  in  the  analysis  of  images  of 
moving  objects. 


Computation  of  Features 

We  want  to  use  features  for  the  recognition  of  objects  and 
comparison  of  sequence  of  images  to  determine  changes.  Some  of  these 
features  are  similar  to  those  used  in  image  understanding  by  humans, 
while  others  are  not  such  as  moments.  We  classify  the  features  into 
three  categories: 

1)  Basic  features  such  as  area,  perimeter,  centroid,  orientation  etc. 
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2)  Per i ved  features  are  those  which  can  be  obtained  from  basic 
features  such  as  the  ratio  of  area/perimeter 

3)  Contextua 1  features  are. those  which  involve  the  relations  with 
other  objects  such  as  relative  position,  size,  neighbors,  etc. 

While  these  features  are  important  in  their  own  right,  their  rate  of 
change  should  be  useful  in  developing  the  prediction  model.  We  are 
concerned  here  with  the  monochrome  images  and  no  textural  features  are 
used.  However,  if  color  and  texture  information  is  available,  it 
should  help  in  the  analysis  rather  than  making  it  more  complicated. 
Results  will  be  presented  in  the  final  section  on  a  sequence  of  test 
images  shown  in  Fig.  2.  This  figure  consists  of  eight  images  of  size 
256x128  pixels  and  they  are  generated  using  the  system  described  in 
[3]. 


Once  the  segmentation  has  been  done  it  is  assumed  that  the 

boundary  of  the  objects  are  represented  by  chain  code  since  it  is  a 

compact  representation  from  storage  view  point  [4],  However,  to  get 
the  chain  code  for  a  segmented  object  (binary  picture)  may  cause  some 
problems  because  chain  code  requires  the  unique  successor  and 
predecessor  of  a  boundary  point.  (A  boundary  in  a  digital  plane  is  a 
collection  of  points  where  each  pont  is  connected  to  two  of  its 

8-neighbors,  except  for  edge  points  where  "Forks"  exist).  In  order  to 
overcome  these  problems  we  employ  the  following  rules- 

1.  If  all  the  4-neighbors  of  a  pixel  are  zero,  then  throw  it  away, 

1. e.,  we  remove  the  isolated  points. 

2.  If  any  of  the  4-neighbors  of  a  pixel  are  zero,  then  it  is 

considered  a  boundary  point  (5,  p.  339]. 

3.  If  any  of  the  4-neighbors  of  a  boundary  point  obtained  in  step  2 
are  inside  the  boundary,  then  it  will  be  the  boundary  point,  otherwise 
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we  throw  it  away  [6,  p.  62].  This  smoothing  step  eliminates  the  area 
of  an  object  which  is  2  pixels  wide. 


Even  after  the  application  of  the  above  3  rules  we  don't  have 
unique  successor  and  predecessor  for  every  boundary  point. 
Pathological  cases  do  occur.  In  such  cases  we  select  one  of  the 
successors  and  proceed  as  usual  by  creating  a  stack.  If  no  successor 
situation  is  encountered,  then  we  start  from  the  last  situation  where 
there  were  more  than  1  successors  and  proceed.  We  keep  repeating  it 
until  we  get  the  chain  code  for  the  complete  boundary  of  the  object. 

An  object  in  the  sequence  is  identified  by  the  scene  number  and 
the  object  number  in  the  scene.  Object  will  be  described  by  its 
starting  point  and  chain  code  and  it  is  noted  whether  the  object  is  a 
boundary  object  (i.e.,  touches  the  boundary  of  the  frame)  or  its  is 
completely  inside  the  frame.  Note  that  from  the  chain  code  and 
starting  point,  we  can  uniquely  find  the  coordinates  of  the  boundary 
points,  whenever  necessary. 

Computation  of  Basic  and  Derived  Features 

Size  and  Shape 

The  size  of  an  object  includes  features  such  as  width,  length, 
maximum  limits  of  extent  (maximum  and  minimum  X-  and  Y-  values) ,  area 
etc.  For  an  object  we  define, 

XMAX  -  Maximum  X  value 
XMIN  -  Minimum  X  value 
YMAX  -  Maximum  Y  value 
YMIN  -  Minimum  Y  value 

From  the  coordinates  of  the  boundary  XMAX,  XMIN,  YMAX,  and  YMIN  can  be 
easily  obtained.  Also  X  and  Y  center  of  the  enclosing  rectangle  can 
be  obtained  as. 
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X  Center  of  Rectangle  =  -MAX.  XM1N 


and 

Y  Center  of  Rectangle  =  Y-MIN 

Furthermore,  width  and  length  of  the  rectangle  can  be  obtained  as, 

width  =  XMAX  -  XM IN 


and 


length  =  YMAX  -  YMIN 


and 


Aspect 


ratio  »  width/length  = 


XMAX  -  XMIN 
YMAX  -  YMIN 


The 

area 

This 

is 

Area 

is 

such 

as 

of  an  object  is  just  the  number  of  points 
computed  by  counting  the  number  of  points 
also  needed  in  the  computation  of  some  oth 
centroid,  orientation  etc. 


that  it 
inside  the 
er  basic 


covers . 
ob j  ect . 
features 


Shape 


There  are  almost  limitless  ways  in  which  perceived  shape  may  be 
classified,  for  example,  circularity,  angularity,  elongation, 
symmetry,  complexity  and  so  on.  However,  we  are  concerned  with  those 
features  which  can  be  easily  computed.  Commonly  used  features  are 
perimeter,  the  ratio  area/perimeter^ ,  orientation,  moment  of  inertia, 
length  of  the  radius  vector  from  centroid  to  the  perimeter,  a  count  of 
the  number  of  corners  of  the  shape  etc.  Now  we  shall  elaborate  on 
these  features. 
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Perimeter:  It  is  just  the  number  of  boundary  points  of  an  object.  The 
ratio  Area/Perimeter  *  is  a  measure  of  compactness  of  the  object.  It 
is  a  dimensionless  quantity.  It  will  be  maximum  for  circles  in  a 
continuous  world. 

Orientation:  The  aspect  ratio  as  described  above  suffers  from  the  fact 
that  it  is  dependent  upon  the  orientation  of  the  object.  We  would 
like  to  obtain  the  rectangle  which  just  encloses  the  shape  in 
arbitrary  orientations  such  that  the  length  of  the  rectangle  is 
parallel  to  the  principal  axis.  Direction  of  the  principal  axis  is 
the  direction  of  the  major  axis  and  the  moment  of  inertia  is  minimum 
along  this  axis.  Let  0  be  the  angle  between  the  principal  axis  and 
the  horizontal  axis.  Let  X  and  Y  be  the  centroid  of  the  object,  then 


££xf(x,y)  ££yf  (x,y) 

X  =  ^  y  y  —  x  v 

X  y  X  y 

m20  =  ££  (x-X)2f(x,y) 

x  y 

m02  =  EE  Cy-Y) 2f (x,y) 
x  y 

mll  =  £2  (y-Y)  f  (x  ,y) 

x  y 


where  f(x,y)  is  the  characteristic  function  of  the  object,  which  is  1 
inside  or  on  the  object  boundary  and  zero  outside.  Note  that  the 
denominator  in  the  equations  for  X,"?  is  the  area.  The  orientation  9 
is  given  by. 


14S 


tan  29 
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2m 


11 


m20~m02 


It  is  to  be  noted  that  9  has  an  ambiguity  of  tt  radians, 
this  ambiguity  sometimes  it  is  useful  to  define  the 
nearest  the  angle  of  maximum  radius  vector.  Now 


3ecause  of 
orientation 


Major  axis  =  length  of  the  orientation  independent  rectar.ole 
=  max  { (x^-X) cos9  + (y.-Y) sin9 ; 

+  max  {  (X-x?)cos0+(?-yJsin9  • 

(x2,y2) 


where  (x,,y  )  and  (x2'^2^  ran9e  over  the  boundary  points. 

axis  which  is  equal  to  the  width  of  the  orientation  j«- ■ 

rectangle  is  obtained  from  (2)  with  9  replaced  by  I. 

The  moment  of  inertia  of  f  (x,y)  about  the  line  y*xtan-  rer.tr:.: 
of  f(x,y)  is  taken  as  origin)  is  given  by 


m 


9 


xC  yi  (ycos6-xsin9 ) 2f (x.v) 
x  y 

m2  0sin2®+ra02cos^®-^inllsin^cos ' 


(3)  gives  the  minimum  moment  of  inertia.  Maximum  moment  of  inertia  is 
found  from  (3)  when  0  is  replaced  by  9+ir/2.  The  ratio  of  minimum  to 
maximum  moment  of  inertia  is  a  measure  of  elongation. 

Radius  vectors :  The  length  of  the  radius  vector  from  centroid  to  the 
points  on  the  perimeter  describes  the  shape  of  an  object.  Important 
radius  vector  information  includes  the  length  of  the  maximum  radius 
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vector  and  its  orientation,  length  of  the  minimum  radius  vector  and 
its  orientation  and  length  of  the  average  radius  vector.  The  derived 
features  from  these  basic  features  include  the  angular  difference 
between  the  maximum  and  minimum  length  radius  vector,  and  a  number  of 
ratios  of  maximum,  minimum  and  average  length  radius  vectors. 

Moments :  Hu  [7]  has  derived  the  seven  moments  which  are  invariant  to 
rotation,  translation  and  scale.  Although  these  moments  don't  carry 
any  physical  intuition,  but  they  are  quite  useful.  For  the  sake  of 
completeness,  they  are  given  below, 


Q1  =  n  2  0  + 11 0  2 

Q2  =  <n20"*n02^  +4nll 

^3  =  *n30~3n12^  +^3n21~n03^ 

Q4  =  (n30+n12)  +(n21+n03J 

Q5  =  (n30~3n12) (n30+n12) [(n30+n12)2-3(n21+n03)2] 

+  (3n21-nQ3)  (n21+nQ3)  [3  (n30-f-n12)  -(n2l+n03)  ^ 

Q6  =  (n20~n02}  1  (n30+n12)2_ (n21+n03)  ^4nll (n30  +  n12 }  (n21  +  n03 J 
Q?  =  (3n21-nQ3  )  (n3Q+n12)  [  (n3Q+n12)  -3  (n2’J'no3  )  2  ' 

+(3n12-n30) (n21+n03} f3(n30+n12)2'(n21+n03)2) 


where 


pq 


=  ^£2 


147 


and 


Hpq  *  Z)  2  Cx-X)p(y-Y)qf  (x,y) 
r  x  y 

It  is  to  be  noted  that  compared  with  the  other  basic  features, 
the  computation  of  these  moments  is  costlier. 

Ratios  of  Discrete  Fourier  Transform  (DFT )  Coefficients  as  Shape 
Features 

Various  researchers  [6]  have  investigated  the  use  of  Fourier 
transform  itself  or  the  features  derived  from  the  transform  to  obtain 
the  features  which  are  invariant  to  rotation,  translation  and  scale. 
In  this  study  we  use  some  shape  features  derived  by  Granlund  [8]  in 
continuous  case.  Because  of  the  changes  from  frame  to  frame  it  is  not 
possible  to  use  these  features  to  obtain  rotation  (initial  point  does 
not  remain  the  same  from  scene  to  scene  because  object  is  moving  and 
moreover  it  is  obtained  after  segmentation).  We  use  4  shape  features, 
which  are  representative  of  the  form  of  the  contour. 
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ai's  t^ie  a^ove  equations  are  Fourier  coefficients.  These  features 
are  complex  in  general.  We  have  computed  these  features  using 
Goertzel  algorithm  [9]  since  it  is  faster  by  a  factor  of  2  than  FFT 
and  moreover  it  does  not  require  that  the  transform  be  computed  at  all 
the  points.  Here  we  need  only  8  DFT  coefficients.  The  boundary 
points  are  put  in  a  complex  array  and  the  signal  is  padded  with  zeros. 
The  transform  size  is  taken  to  be  1024.  We  cannot  take  the  perimeter 
as  the  transform  size  since  the  perimeter  of  an  object  varies  from 
scene  to  scene  either  because  the  segmentation  is  not  perfect  or  the 
shape  of  the  object  changes. 

Contextual  Features 


In  this  study  we  have  used  five  contextual  features.  One  of  them 
is  principally  a  size  feature.  It  is  the  area  of  an  object  relative 
to  other  objects  (relation  is  greater).  The  other  4  features  are 
neighbors  of  the  object,  relative  position  of  an  object  in  relation  to 
its  neighbors,  centroidal  distance  between  an  object  and  its  neighbors 
and  the  minimum  distance  between  an  object  and  its  neighbors. 
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The  neighbors  of  an  object  are  found  by  finding  all  those  objects 
which  are  completely  or  partially  within  an  area  of  twice  the  width 
and  length  of  the  rectangle  of  this  object  when  the  rectangle  is 
placed  about  the  centroid  of  this  object.  This  method  of  finding 
neighbors  takes  into  consideration  only  the  size  of  the  object.  A 
better  way  would  be  to  consider  the  size  as  well  as  the  velocity  of 
the  object  in  the  determination  of  neighbors.  Note  that  neighborhood 
relation  is  not  symmetric. 

Relative  Position 

For  all  the  objects  which  are  neighbors  of  an  object,  we  find  the 
relative  position  of  these  neighboring  objects  with  the  object. 
Relative  position  is  given  by  4  relations,  above,  below,  to  the  right 
and  to  the  left.  An  object  R1  is  said  to  be  above  object  R2,  if 

(Top (R1 )  <  Top (R2) )  and 

(Bottom (Rl)  <  Centroid  X(R2))  and 

(Right (R1 )  MIN  Right (R2 ) )  >  (Left(Rl)  MAX  Left(R2)) 

An  object  Rl  is  said  to  be  below  object  R2,  if 

(Bottom(Rl)  >  Bottom(R2))  and 

(Top (Rl )  >  Centroid  X  ( R 2 ) )  and 

( (Right (Rl )  MIN  Right(R2))  >  (Left(Rl)  MAX  Left(R2)) 

An  object  Rl  is  said  to  be  to  the  left  of  object  R2,  if 

(Lef t (Rl)  <  Left (R2) )  and 

(Right(Rl)  <  Centroid  Y(R2))  and 

(  (Top  (Rl )  MAX  Top  (R2)  )  >_  (Bottom(Rl)  MIN  Bottom(R2)) 


Similarly,  an  object  Rl  is  said  to  be  to  the  right  of  object  R2,  if 

(Right (R1 )  >  Right(R2))  and 

(Left(Rl)  >  Centroid  Y(R2))  and 

( (Top (Rl )  MAX  Top (R2) )  >  (Bottom(Rl)  MIN  Bottom(R2)) 

Although  the  neighborhood  relation  is  not  symmetric,  yet  we  have 
taken  the  relative  position  relation  as  symmetric.  For  example  (see 
Table  2)  although  object  5  has  no  neighbor,  yet  it  is  said  to  be  above 
7  because  object  7  has  object  5  as  its  neighbor  and  object  7  is  below 
object  5.  Also  note  that  although  an  object  may  have  a  particular 
neighbor,  but  there  may  not  exist  any  relative  position  relation 
between  them. 

Centroidal  distance  between  an  object  and  its  neighbors  -  it  is 
the  distance  between  the  centroid  of  this  object  and  its  neighbors. 

Minimum  distance  between  an  object  and  its  neighbors  -  it  is  the 
minimum  distance  between  the  boundary  of  this  object  and  its  neighbor. 
In  other  words  it  will  be  the  minimum  distance  travelled  by  one  of 
these  objects  when  they  will  start  touching  each  other. 

An  Example 

For  the  test  sequence  shown  in  Fig.  2,  computation  of  all  the 
features  is  shown  in  Table  1  and  2  which  show  the  basic  and  derived 
features  for  object  1  and  contextual  features  for  all  the  objects  in 
the  first  frame.  These  tables  are  self  explanatory. 

Conclusion 

In  this  paper  we  have  discussed  the  computation  of  various 
features  which  should  be  useful  in  the  analysis  of  image  sequences. 
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Table  1.  Basic  and  derived  features  for  object  1 
of  frame  1  (see  Fig.  2). 


No.  of  elements  in  the  chain  code:  27 
Starting  coordinates:  (16,79) 
000000007  556  5'5444  53  43  43 1111 


Basic  Features 


1.  XMAX:  23 

2.  XMIN:  16 

3.  YMAX:  88 

4.  YMIN:  75 

5.  Perimeter:  27 

6 .  Area :  7  2 

7.  X  Centroid:  19 

8.  Y  Centroid:  82 

9.  Orientation:  1.817  Rad. 

10.  Major  axis:  13.33 

11.  Minor  axis:  7.040 

12.  Max.  Moment  of  Inertia:  766.7 

13.  Min.  Moment  of  Inertia:  235.2 

14.  Max.  Radius:  7.071 

15.  Min.  Radius:  3.000 

16.  Avg.  Radius:  4.444 

17.  Angle  Max.  Radius:  278.13  Degs . 

18.  Angle  Min.  Radius:  180.00  Degs. 

19.  Moment  1:  13.91 

20.  Moment  2:  54.50 

21.  Moment  3:  2.042 

22.  Moment  4:  2.945 

23.  Moment  5:  7.215 

24.  Moment  6:  17.95 

25.  Moment  7:  -.33708 

26.  Mag  D22:  .9911 

27.  Phase  D22:  .0000 

28.  Mag  D12 :  .9933 

29.  Phase  D12:  .000(7 

30.  Mag  D13 :  .9867 

31.  Phase  D13:  .0000 

32.  Mag  D44 :  . 9648 

33.  Phase  D44:  .0000 


Derived  Features 


1.  X  Center  of  Rectangle:  20 

2.  Y  Center  of  Rectangle:  82 

3.  Width  of  the  Rectangle:  7 

4.  Length  of  the  Rectangle:  13 

5.  Width/Length:  .5384  2 

6.  Thinness, Area/Perimeter  :  .9876 

7.  Major  axis/Minor  axis:  1.894 

8.  Max.  Moment  of  Inertia/Min.  Moment  of  Inertia:  3.259 

9.  Max.  Radius/Min.  Radius:  2.357 

10.  Max.  Radius/Avg.  Radius:  1.590 

11.  Min.  Radius/Avg.  Radius:  .6749 

12.  Angular  difference  (Max.  and  Min.  Radius):  98.130  De 
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Table  2.  Contextual  features  for  frame 


The  rate  of  change  of  feature  values  should  be  very  helpful  in 
generating  more  accurate  and  .precise  prediction  model.  The 
computation  of  these  features  is  conditional  as  shown  in  Fig.  1.  The 
work  along  these  lines  is  presently  under  investigation  and  will  be 
reported  in  the  future. 
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2.10  Region  Descriptions  Using  Range  Data 

A.  Huertas,  S.  Inokuchi  and  R.  Nevatia 


Introduction 

Scene  analysis  problems  including  segmentation  and  shape  analysis 
are  simplified  when  range  data  ,  i.e.  the  distances  of  observed  points 
of  a  scene  from  the  viewer,  is  available.  An  edge  based  approach  has 
been  described  in  [1]  by  which  fairly  complete  region  boundaries  can 
be  detected  in  range  pictures.  Boundary  segments  can  then  be  traced 
in  order  to  find  closed  regions  and  some  region  properties  can  be 
inferred  by  observing  the  orientation  of  the  segments  along  a  region. 
Descriptions  on  how  regions  occlude  other  regions  can  also  be 
obtained . 

In  this  report  we  describe  a  boundary  tracing  program  which 
traces  region  boundaries  that  have  been  detected  using  the  method 
described  in  [1].  The  region  descriptions  provided  by  the  program 
include  a  classification  of  the  regions  found  as  complete  parts, 
subparts,  complete  holes  and  spaces.  Occlusion  information  is 
obtained  by  examining  the  classification  of  the  regions  and  the 
available  range  data. 


Region  Descriptions 


The  region  description  process  proceeds  in  two  steps. 
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Region  boundary  tracing 


a . 


Finding  boundaries  of  closed  regions  is  relatively  simple,  if 
complete  boundary  segments  are  found.  By  maintaining  a  list  of  the 
segments  with  two  locations  for  each  segment,  suitable  segmerts  can  be 
selected  to  start  the  trace  of  a  region.  Once  a  seament  is  selected 
the  boundary  can  be  followed  by  selecting  the  appropriate  sequence  of 
"next  segments".  Each  of  the  two  positions  in  the  list  are  used  to 
mark  a  selected  segment  as  "visited”  according  to  the  direction 
followed. 

The  process  proceeds  in  a  counterclockwise  manner.  The  "next 
segment"  function  (NSF)  selects  among  the  set  of  segments  with  one  end 
point  lying  within  a  one  pixel  neighborhood  of  the  end  point  of  the 
last  traced  segment  (current  segment) ,  the  one  that  has  the  smallest 
angle  measured  clockwise  at  the  end  point  of  the  current  segment  if 
the  segments  intersect.  Otherwise  the  angle  is  measured  at  the 
intersection  of  the  extension  of  the  current  segment  with  the  segment 
being  considered  for  selection.  Figure  1  shows  several  examples  of 
how  the  angles  are  measured.  If  the  selected  segment  is  located 
within  a  certain  distance  of  the  frame  of  the  picture,  the  frame  is 
followed  in  the  right,  up,  left  or  down  direction  according  to  the 
section  of  the  frame  encountered  (bottom,  right  side,  top  or  left  side 
respectively).  The  first  segment  encountered  with  one  end  point  lying 
within  a  certain  distance  of  the  frame  is  selected  and  the  process 
continues . 

Four  aspects  need  to  be  considered  when  selecting  the  next 
segment  in  the  tracing  process. 

1)  Figure  la  shows  that  one  pixel  long  segments  might  be  skipped 
if  they  are  collinear  with  the  current  segment.  Segment  5  ^  is  the 
current  segment,  S2  is  a  one  pixel  long  segment,  and  both  S2  and  S3 
have  an  end  point  within  one  pixel  of  the  end  point  of  S3.  Since 


0 ^  ^ ,  S ^  is  selected  and  S2  is  skipped.  The  NSF  knows  if  a  segment 
is  skipped  and  marks  it  as  visited  to  prevent  its  being  selected  again 
when  starting  the  trace  of  a  new  region.  Figure  lb  shows  that  if  a 
one  pixel  long  segment  is  not  collinear  with  the  current  segment,  the 
direction  of  the  trace  might  be  reversed.  is  the  current  segment, 
S ^  is  a  one  pixel  long  segment  with  both  ends  lying  within  one  pixel 
from  S^,  and  has  one  end  point  within  one  pixel  from  S^.  The  angle 
0^  between  and  S2  measured  at  the  head  of  taking  S2  in  the 
direction  opposite  to  the  orientation  of  is  usually  very  small. 
This  clue  is  used  by  the  NSF  if  a  change  in  direction  occurs  to 
determine  the  selection  or  rejection  of  a  one  pixel  long  segment  as  a 
suitable  next  segment. 

2)  Regions  that  can  be  described  as  parts  or  subparts  tend  to  be 
convex  shaped.  Figure  lc  shows  that  the  process  can  be  side  tracked 
if  the  angle  0^  between  the  current  segment  and  S3  is  larger  than 
180  degrees.  To  avoid  this,  the  NSF  places  a  window  centered  at  the 
end  of  the  current  segment  S^.  This  window  can  grow  up  to  three 
pixels  in  diameter  in  an  attempt  to  locate  a  segment  which  preserves 
convexity. 

3)  If  N  is  the  size  of  the  masks  used  during  the  boundary 
detection  process,  short  segments  lying  within  (N-l)/2  pixels  of  the 
frame  of  the  picture  are  considered  unreliable  due  to  the  incomplete 
data  resulting  from  the  convolution  of  the  picture  data  with  the 
masks.  These  segments  are  eliminated  in  a  preprocessing  step. 

4)  Non-oriented  segments  generated  by  the  radial  line  detection 
process  described  in  [1]  fall  one  pixel  short  of  the  location  where 
they  would  meet  existing  segments.  These  segments  are  shown  in 
figures  Id  and  3  as  thick  lines  with  no  orientation.  Figure  Id  shows 
the  current  segment  S^,  the  segment  with  one  end  point  within  one 
pixel  from  S  ,  and  S  with  one  end  point  located  two  pixels  from  S^. 

X  fc 

Notice  that  although  segment  meets  all  the  criteria  for  a  suitable 
next  segment,  the  desired  next  segment  is  S2«  By  extending  the  loose 
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end  of  the  non-oriented  segments  by  one  pixel  in  the  preprocessing 
step,  the  desired  next  segment  can  be  easily  selected. 


When  the  process  is  unable  to  select  a  suitable  next  segment 
given  the  above  considerations ,  the  current  segment  is  extended  by  up 
to  a  fixed  number  of  pixels.  At  each  step  a  suitable  next  segment  is 
sought  for.  The  trace  of  the  current  region  is  terminated  if  no 
suitable  segment  is  found  and  the  region  being  traced  is  classified  as 
an  incomplete  part. 


Two  criteria  determine  a  closed  region.  The  starting  point  of  a 
trace  is  within  one  pixel  of  the  ending  point  and/or  the  sum  of  the 
exterior  angles  is  approximately  equal  to  360  degrees  in  absolute 
value . 

b.  Region  descriptions 

Some  region  properties  can  be  inferred  by  observing  the 
orientation  of  the  segments  along  the  boundary  of  a  region  as  shown  in 
figure  2.  Basically  the  parts  and  spaces  can  be  differentiated  oy 
whether  the  segments  in  the  boundary  are  oriented  in  the  same 
direction  as  the  trace  or  in  the  opposite  direction.  Related  line 
labeling  analysis  can  be  found  in  [2]. 

Occlusion  is  indicated  by  not  all  segments  pointing  in  the  same 
direction,  compared  to  the  direction  of  the  region  tracing.  By 
generating  a  set  of  triples  (A,B,C)  indicating  that  region  A  and 
region  B  share  a  common  segment  C,  the  following  can  be  asserted: 

If  the  B  component  in  a  triple  occurs  only  once  among  the  set  of 
all  triples  then  region  B  occludes  region  A  and  region  A  cannot  be 
merged  with  any  other  region  to  form  a  larger  region.  In  our  example 
shown  in  figure  3  the  triple  ( 5 , 1 , S±)  corresponds  to  region  5  (A 
component)  being  occluded  by  region  1  (B  component).  No  other  triple 
contains  region  1  as  its  B  component  and  therefore  region  5  cannot  be 
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Complete  Part 


Sub  Parts 


Space 


Complete  Hole 


Figure  2.  Types  of  regions. 
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CLASSIFICATION 

OF  REGIONS 

Region 

1 

complete  part 

Region 

10 

subpart 

Region 

2 

complete  part 

Region 

11 

complete  part 

Region 

3 

complete  part 

Region 

12 

space 

Region 

4 

complete  part 

Region 

13 

space 

Region 

5 

subpart 

Region 

14 

space 

Region 

6 

subpart 

Region 

15 

space 

Region 

7 

complete  part. 

Region 

16 

space 

Region 

8 

subpart 

Region 

17 

space 

Region 

9 

subpart 

Region 

18 

incomplete  part 

OCCLUSION 

Parts  occluding  one  part: 

Subpart  5  is  occluded  by  complete  part  1 

Subpart  6  is  occluded  by  complete  part  4 

Subpart  8  is  occluded  by  complete  part  4 

Subpart  9  is  occluded  by  complete  part  8 

Subpart  10  is  occluded  by  compete  part  7 

Parts  occluding  two  parts: 

Subparts  6,  3  are  one  part  occluded  by  complete  part  4 


Figure  3. 
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merged  with  any  other  region. 

If  the  A  component  in  a  triple  occurs  only  once  among  the  set  of 
all  triples  then  only  the  region  indicated  by  the  B  component  occludes 
it.  In  our  example  the  A  component  in  the  triple  (9,8,Sj)  corresponds 
to  region  9  and  occurs  only  once.  Therefore  only  region  8  occludes 
it. 


If  two  or  more  triples  have  the  same  B  component,  they  can  be 
merged  to  form  a  larger  region  provided  they  lie  on  the  same  plane. 
By  finding  some  of  the  interior  points  in  the  occluded  regions,  the 
coefficients  of  the  equations  of  their  plane  surfaces  can  be 
estimated.  If  the  two  planes  are  within  certain  tolerances  of  each 
other,  then  the  occluded  regions  can  be  merged  to  form  a  larger 
region.  In  our  example  the  triples  (6,4,S^)  and  (8,4,3^)  have  the 
same  B  component.  By  estimating  and  comparing  the  coefficients  of  the 
planes  on  which  regions  6  and  8  lie,  it  is  determined  that  they  can  be 
merged  to  form  a  larger  region. 

-  If  two  or  more  triples  have  the  same  A  component  then  two  or  more 
regions  occlude  the  region  indicated  by  the  A  component.  The 
occluding  regions  (B  components)  can  be  merged  to  form  a  larger 
occluding  region  if  they  lie  in  the  same  plane  and  have  at  least  one 
segment  in  common.  No  instance  of  this  case  occurs  in  our  example. 

Figure  3  shows  the  regions  in  our  example  and  the  classification 
made  by  the  program  and  the  description  of  how  regions  occlude  other 
regions . 
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5.  HARDWARE  IMPLEMENTATION  OF  IU  ALGORITHMS 


ADVANCED  IMAGE  UNDERSTANDING  USING  LSI  AND  VLSI 

S.D.  Fouse,  V.S.  Wong,  and  G.R.  Nudd 

Hughes  Research  Laboratories 
Malibu,  California  90265 


Abstract 

We  describe  here  the  work  undertaken  at  the  Hughes  Research 
Laboratories,  Malibu,  in  support  of  the  DARPA  Image  Understanding  (IU) 
program.  This  report  covers  the  period  from  May  1980  through  September 
1980.  The  principal  aim  of  our  work  during  the  present  phase  of  the 
contract  is  to  investigate  the  applicability  and  potential  benefit  of 
very  large  scale  integrateion  (VLSI)  to  IU  systems.  Our  work  cowards 
this  goal  includes  detailed  logic  design  for  two  intermediate'- level  IU 
systems,  a  line  finder  and  a  texture  classif icetion  system,  and  identi¬ 
fication  of  a  highly  modular  programmable  digital  processing  element, 
which  is  currently  under  construction.  In  addition  to  the  major  empha¬ 
sis  of  the  program,  we  are  directing  our  work  so  that  the  hardware 
designed  will  be  fully  compatible  with  existing  commercial  hardware  such 
as  the  DEC  mainframes.  The  details  of  the  work  that  has  been  completed 
to  date  as  well  as  our  goals  for  the  program  are  described  below. 
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I. 


INTRODUCTION 


The  main  emphasis  of  chis  program  is  the  investigation  of  the 
impact  and  potential  benefit  of  very  large  scale  integration  (VLSI)  and 
high-density  IC  technologies  for  image  understanding.  The  task,  is  some- 
whac  wider  than  our  previous  work  for  the  Image  Understanding  Program, 
where  we  successfully  developed  special-purpose  high-speed  primitives  fo 
the  low-level  processing  operations. 

The  progress  chat  is  currently  being  made  in  silicon  technology 
and  the  development  of  more  sophisticated  algorithms  for  image  under¬ 
standing  provide  the  basis  for  a  major  advance  in  processing  capability. 
The  work  described  here  is  being  undertaken  at  Hughes  Research  Laborator 
ies  (HRL) ,  where  we  are  actively  involved  in  the  development  of  image 
analysis  and  understanding  software  for  applications  such  as  terminal 
homing,  image  bandwidth  compression,  and  scene  matching.  We  also  have 
numerous  active  research  programs  in  micro-electronic  technology.  The 
people  involved  in  this  program  were  also  involved  in  the  VHSIC-0  pro¬ 
gram.  In  addition  at  HRL  we  have  eleven  VHSIC-3  technology  development 
programs.  We  are  therefore  aware  of  the  major  developments  in  these  and 
other  important  programs,  and,  where  appropriate,  we  have  coordinated 
our  efforts. 

Our  major  emphasis  has  been  to  review  numerous  complex  image¬ 
understanding  algorithms  and  devise  VLSI  concepts  for  them.  This 
includes  understanding  the  processing,  data  flow,  timing,  and  storage 
requirements.  We  have  also  performed  a  detailed  partitioning  of  poten¬ 
tial  VLSI  chips  based  on  total  gate  count,  silicon  area,  communication 
requirements,  and  power  considerations.  From  this  work,  three  poten¬ 
tial  systems  have  emerged:  a  line  finder,  texture  analyzer,  and 
segmenter.  The  details  of  this  are  included  in  this  report  together 
with  the  necessary  gate  estimate  and  parts  count  for  VLSI  chips 
(Tables  1  and  2). 

In  addition  to  Chis,  we  have,  through  our  detailed  analysis  of 
the  VLSI  processor  requirements  and  conf iguration ,  identified  a  highly 
modular  programmable  digital  processing  element  RADIUS  (residue-based 
arithmetic  image-understanding  svstem).  These  concepts  are  now  being 
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developed  in  scate-of-che-art  n-MOS  technology,  and  we  anticipate  that 
the  first  demonstration  will  be  available  in  the  early  part  of  1981. 

Since  RADIUS  is  based  on  residue  arithmetic,  it  can  perform  a  wide  vari¬ 
ety  of  processing  operations  (including  variance  and  moment  calcula¬ 
tion)  and  all  convolutions  and  local  area  operations  with  very  high  cir¬ 
cuit  function  density.  The  modularity  of  our  approach  allows  rapid  and 
easy  design  in  VLSI  and  provides  programmability  and  portability. 

In  addition  a  major  emphasis  of  all  our  work  at  this  point  is 
to  ensure  that  the  hardware  concepts  and  changes  will  be  fully  integrat- 
able  with  existing  commercial  hardware,  such  as  the  DEC  mainframes.  To 
this  end,  we  are  developing  the  necessary  interfaces  to  the  RADIUS  pro¬ 
cessor  so  that  it  can  be  used  as  an  attached  processor  to  the  DEC  series, 
communicating  through  the  DEC-UNIBUS.  We  are  obtaining  a  PDP11-34  to 
use  as  the  test  vehicle  for  our  approach.  We  anticipate  that  this  machine 
will  be  available  in  mid-1981,  after  which  the  successful  development  of 
our  hardware  and  interface  will  provide  a  means  for  integrating  the  system 
into  the  DARPA  IU  test-bed  as  well  as  other  systems. 

Our  work  on  more  complex  IU  algorithms  in  addition  to  the  three 
cited  above  will  continue  to  enable  us  to  identify  special  elements  in 
the  IU  chain  that  can  appropriately  exploit  the  VLSI  3nd  VHSIC  develop¬ 
ments.  This  work,  to  be  reported  on  in  the  next  semi-annual,  will  include 
concept  developments,  partitioning,  simulation,  chip  count,  and  where 
appropriate,  layout  details. 

II.  METHOD 

Our  objective  for  this  project  is  to  provide  some  general  results 
that  will  allow  the  image  understanding  community  to  take  full  advantage 
of  VLSI  technology  as  it  becomes  available.  We  expect  our  results  to 
include  definitions  for  several  special-purpose  chips  that  will  have 
wide  applicability  to  IU  systems.  The  question  to  be  answered  then  is 
what  types  of  functions  should  be  cast  into  a  '.'LSI  chip.  The  goals  of 
this  program  are  analogous  to  the  on-going  VHSIC  efforts  where  they  are 
searching  for  commonality  across  a  broad  range  of  DOD  systems.  Here  we 
are  trying  to  take  a  longer  range  view  specifically  for  image  understanding. 
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The  project  has  been  coordinated  with  the  Hughes  VHSIC  program  and  shares 
many  goals  with  it.  The  major  difference  is  that  we  would  like  to  see 
commonality  across  IU  systems.  We  are  tracking  the  VHSIC  program,  and, 
where  appropriate,  we  will  be  able  to  take  advantage  of  the  work  being 
done  there  that  complements  our  own  effort. 

The  area  of  this  study  is  necessarily  somewhat  broad.  It  differs 
considerably  from  our  previous  work,  where  we  developed  special-purpose 
high-speed  primitives  for  IU.  Our  aim  this  period  has  been  to  stand 
back  and  take  a  broad  a  vew  of  IU  requirements  and  to  attack  the  ques¬ 
tion  of  what  special-purpose  VLSI  functions  would  be  appropriate  for  IU 
systems  chat  will  not  be  developed  under  present  or  foreseeable  VHSIC 
programs.  Our  approach  is  to 

•  Select  three  representative  systems  to  study  and 
characterize 

•  Perform  a  commonality  study  across  the  three  systems 

•  Do  logic  designs  for  each  system 

•  Partition  the  designs  onto  VLSI  chips 

•  Test  and  refine  designs  using  simulation  techniques 

•  Identify  relevant  system  parameters 

•  Generalize  to  additional  systems  and  refine  function 
partitioning. 

So  far,  we  have  selected  three  systems  and  performed  a  preliminary 
commonality  study,  identifying  one  subsystem  as  a  likely  candidate  for 
a  VLSI  chip.  This  is  a  local  area  processor  which  will  perform  sliding 
window  arithmetic  operations  including  convolution,  variance,  and 
moment  calculations.  We  have  designed  a  processor  based  on  the  residue 
arithmetic  technique  that  should  provide  significantly  better  perform¬ 
ance  far  this  type  of  operation  than  a  binary  processor.  In  addition, 
we  have  performed  logic  designs  on  two  of  the  three  systems  selected: 
the  line  finder  and  the  texture  classification  systems.  What  remains 
to  be  done  is  to  design  the  third  system,  the  segmenter,  refine  our 
designs  using  results  from  simulations,  and  then  extend  the  results 
to  additional  IU  systems. 


III.  RESIDUE- BASED  ARITHMETIC  IMAGE  UNDERSTANDING  SYSTEM  (RADIUS) 

Almost  all  of  the  systems  we  have  looked  at  require  some  sort  of 
local-area  processing  where  the  output  pixel  is  a  function  of  the  input 
pixel  and  its  M  nearest  neighbors,  where  M  is  usually  8,  24,  48,  etc. 

The  function  takes  numerous  forms  but  typically  is  either  arithmetic  or 
logical  or  a  combination  of  both.  If  the  function  is  arithmetic  and 
does  not  require  division  or  absolute  value,  then  residue  arithmetic 
techniques  can  be  used.  Examples  where  these  conditions  are  met  can  be 
found  in  two  of  the  systems  we  are  studying:  the  line,  finder  and  the 
texture  classification  systems.  The  line  finder  detects  edges  by  con¬ 
volving  the  image  with  six  5x5  masks,  each  mask  responding  to  a  dif¬ 
ferent  direction.  Since  the  convolution  operation  requires  multiplica¬ 
tions  and  additions,  this  is  easily  done  using  residue  techniques.  Simi¬ 
larly,  the  texture  classification  system  convolves  the  input  image  with 
several  5x5  masks. 

An  arithmetic  local  area  processor  seems  to  be  a  natural  choice 
for  a  subsystem  that  can  take  advantage  of  VLSI  technology.  This  is 
because  of  the  complexity  of  the  logic  for  doing  multiples.  Using  cur¬ 
rent  technology,  a  high-speed  multiplier  requires  an  entire  chip  (such  as 
the  TRW  10-MHz  multiplier).  One  approach  being  considered  for  real¬ 
time  hardware  is  to  compromise  on  die  coefficients  and  thereby  reduce  the 
complexity  of  the  hardware.  The  residue  techniques  can  offer  signifi¬ 
cant  advantages  without  compromise,  since  multiplication  can  be  per¬ 
formed  using  look-up  tables;  these  tables  will  be  small  because  the  bases 
used  will  be  small.  As  an  illustration  of  this,  Figure  1  shows  the  com¬ 
ponents  involved  in  RADIUS.  The  input  data  are  initially  encoded  into 
its  equivalent  representation  in  each  base.  This  is  most  easily  done  using 
a  ROM  lookup  table  with  an  address  space  equal  to  the  input  dynamic  range 
and  a  bit  depth  equal  to  the  number  of  bits  required  to  represent  the 
Base-1.  The  actual  computation  is  then  performed  by  the  local  area 
processors,  one  processor  per  base  used.  The  processor  output  word  sice 
is  identical  to  the  input  word  size  with  no  loss  in  accuracy.  This  is 
due  to  the  modular  nature  of  residue  arithmetic.  The  data  out  of  the 
convolvers  then  goes  into  a  residue-to-binary  decoder.  This  block  can 
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Figure  1.  A  residue-based  processor. 
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be  composed  of  a  very  large  lookup  table  or  a  system  composed  of  ROMs 
and  adders.  The  exclusive  use  of  look-up  tables  is  well  suited  to  VLSI 
since  these  are  highly  regular  structures  and  hence  can  be  made  very 
dense  and  inexpensive. 

Many  arithmetic  operations  can  be  performed  with  table  look-ups, 
but  without  the  residue  concept,  this  approach  is  typically  not  feasi¬ 
ble,  as  the  tables  required  become  overwhelmingly  large.  Since  both  the 
line-finder  system  and  the  texture  system  utilize  5x5  convolutions, 
the  processor  system  should  be  capable  of  being  used  to  perform  5x5 
convolutions  with  programmable  weights.  The  dynamic  range  capabilities 
of  the  system  should  allow  for  6  bit  input  and  6  bit  kernel  weights  for 
a  total  output  dynamic  range  of  17  bits.  In  addition,  the  system  should 
be  able  to  operate  at  a  10-MHz  data  rate.  The  dynamic  range  of  a  resi¬ 
due  computation  system  is  determined  by  the  product  of  all  the  bases. 

Using  31,  29,  23,  and  19  as  bases  will  give  us  a  dynamic  range  of  392,863, 
L7 

which  exceeds  2  .  The  bases  will  have  other  benefits  since  they  are 

al  prime  numbers. 

Figure  2  shows  a  block  diagram  for  a  5  x  5  local  area  processor 
for  a  single  base.  There  are  five  data  inputs,  one  for  each  line  of  the 
5x5  area.  Each  input  data  is  put  into  a  register  which  is  the  first 
element  of  a  five-element  shift  register.  When  new  data  is  transferred 
in,  the  previous  data  is  transferred  to  the  next  register,  and  so  on. 

There  are  five  inputs  and  thus  five  shift  registers  -with  five  elements 
each  for  a  total  of  25  registers.  The  contents  of  each  of  the  25  regis¬ 
ters  are  used  to  address  32  element  by  5  bit  RAMs.  These  RAMs  are 
lookup  tables  for  the  multiplication  of  Che  input  data  and  the  kernel 
weight.  The  outputs  of  the  25  RAMs  are  then  added  by  a  tree  of  24  resi¬ 
due  (or  modular)  adders.  A  residue  adder  calculates  (A  +  B)  mod  base. 

There  are  several  points  to  notice  about  this  design.  First, 
only  5  bits  come  out  of  the  multiply  or  add.  This  is  because  these 
operations  are  cyclical,  and  the  output  values  are  in  the  same  range  as 
each  of  the  input  values.  The  second  point  is  that  the  only  part  of  the 
processor  that  is  dependent  on  the  base  is  the  residue  adder.  Since 
the  multiplier  must  be  programmed  for  Che  weights  anyway,  the  choice  of 
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base  is  also  programmed  at  che  same  time.  Lastly,  concerning  the 
programming  of  the  RAMs,  a  bidirectional  data  bus  will  be  provided  on 
the  data  lines  of  the  RAMs  to  allow  programming  as  well  as  to  serve  as 
a  test  point  in  the  processor.  Additional  control  lines  into  a  binary 
decoder  will  be  required  for  programming  and  controlling  the  bus  for 
testing.  A  possible  design  for  Che  residue  adder  is  shown  in  Figure  3. 
Essentially,  the  device  performs  a  binary  addition  of  the  two  operands, 
compares  the  result  to  the  base,  and  subtracts  the  base  if  the  result  is 
greater  than  or  equal  to  the  base.  The  adder  is  programmed  by  providing 
che  base  value  to  the  comparator  and  the  two's  complement  value  of  the 
base  to  che  second  adder.  The  programming  can  be  done  in  one  of  two 
ways.  First,  and  preferable,  is  to  provides  register  on  the  chip  for 
the  base  value  and  another  register  for  che  two's  complement  of  the 
bases.  The  value  in  the  register  must  then  be  made  available  to  each  of 
the  residue  adders  that  will  be  on  the  chip.  The  second  alternative 
is  to  make  the  base  programmable  by  using  a  ROM.  This  has  the  advantage 
chat  base  values  can  be  stored  directly  in  the  adder,  which  will  prevent 
possible  routing  problems. 

We  are  developing  a  chip  based  on  RADIUS.  However,  to  reduce  the 
cost  of  development  and  che  associated  risk,  we  are  restricting  the  chip 
to  a  5  x  1  area  processor.  If  this  demor>= cration  is  successful,  we  will 
be  able  to  use  the  designs  and  digitized  data  base  as  a  common  module  to 
be  prepared  across  an  entire  VLSI  chip,  resulting  in  a  very  high  densitv, 
high  throughput  processor.  Figure  4  shows  the  design  for  a  5  x  1  proces¬ 
sor,  and  Figure  5  shows  the  configuration  of  a  5  x  5  convolution  system 
which  utilizes  the  5x1  processor  chip.  The  data  are  encoded  using  a 
64  element  by  20  bit  ROM.  The  kernel  is  generated  using  four  20  bit 
wide  line  delays,  5  bits  fcr  each  base.  The  output  of  the  line  delays 
is  input  directly  to  an  array  of  20  5  x  1  processors,  five  for  each  base. 
For  each  base,  four  1024  x  5  bit  ROMs  are  used  to  sum  the  results  of 
che  five  rows.  Note  chat  conventional  adders  cannot  be  used  since  these 
additions  must  be  done  modularlv,  just  as  on  the  chip.  Finally,  3  bits 
from  each  base  convolver  is  input  to  a  decoder  network,  which  will  be  an 
array  of  ROMs  and  possibly  some  logic.  The  output  of  the  decoder  is 
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chen  a  17  blc  binary  word.  We  expect  chat  parts  will  be  available  for 
testing  by  March  1981,  and  chat  a  demonstration  system  such  as  depicted 
in  Figure  5  will  be  completed  by  July  1981. 

IV.  TEXTURE  CLASSIFICATION  SYSTEM 

As  mentioned  earlier,  logic  design  was  performed  for  both  the 
texture  classification  system  and  the  line  finder  system.  This  section 
describes  the  design  generated  for  Che  texture  system.  Figure  6  illus¬ 
trates  the  data  flow  of  the  texture  system.  The  input  video  is  pro¬ 
cessed  in  N  independent  channels,  where  each  channel  generates  an  ele¬ 
ment  of  a  feature  vector.  This  means  that,  for  each  scalar  pixel  input, 
there  is  a  vector  value  output,  the  length  of  the  vector  being  equal 
to  N,  the  number  of  channels  in  Che  system.  The  processing  done  in  Che 
channels  involves  a  small  window  convolution,  a  normalization  step,  and 
a  large  window  energy  measure.  The  outputs  are  combined  to  form  a  vector, 
which  is  then  transformed  using  a  linear  transformation.  The  elements 
of  the  transformed  vector  are  used  to  evaluate  a  discriminant  function, 
the  value  of  which  is  used  to  perform  the  classif ication.  Alternatively, 
the  transformed  feature  vector  can  be  input  to  a  segmentation  routine. 

Six  major  functions  must  be  performed: 

•  Small  window  convolution 

•  Small  window  statistical  calculation 

•  Scaling 

•  Large  window  statistical  calculation 

•  Linear  transformation 

•  Discriminant  function  evaluation. 

We  discuss  below  the  operations  involved  in  each  of  the  functions  and 
the  hardware  required  to  perform  the  functions. 
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A. 


Small  Window  Convolution 


The  first  step  in  the  processing  involves  computing  a  5  x  5 
convolution.  The  design  of  a  system  (RADIUS)  to  perform  this  computa¬ 
tion  using  the  technique  of  residue  arithmetic  is  presented  in  the  pre¬ 
vious  section.  The  basic  block,  that  performs  the  5x1  convolution  is 
now  being  designed  as  an  NMOS  chip  (CRC  181)  at  the  Hughes  Carlsbad 
Research  Center. 

B.  Small  Window  Statistical  Measure 

A  general  structure  for  calculating  a  statistical  moment  over  a 
two-dimensional  window  was  described  in  Hughes  invention  disclosure 
PD80078;  Figure  7  shows  the  specific  structure  for  calculating  the  vari¬ 
ance  over  a  5  x  5  window.  This  architecture  assumes  that  the  image 
data  are  being  input  in  raster  scan  format.  As  each  pixel  is  intro¬ 
duced  into  the  system,  the  window  that  is  being  processed  is  shifted 
one  column  to  the  right,  dropping  the  left-most  column  and  adding  the 
column  on  the  right.  This  means  that  the  function  for  the  new  window 
position  can  be  calculated  by  subtracting  the  contribution  of  the  lost 
column  and  adding  the  contribution  from  the  new  column.  It  should  be 
noted,  as  illustrated  in  Figure  8,  that  as  the  center  of  the  5x5 
window  moves  across  the  image  each  subsequent  kernel  can  be  formed  by 
the  removal  of  a  single  pixel  at  the  top  right,  for  instance,  and  the 
addition  of  one  new  pixel  at  the  bottom  left.  With  this  proviso,  we 
can  scan  the  image  directly  by  successively  eliminating  the  5  pixel 
column  at  the  trailing  edge  and  adding  the  new  column  on  the  right  as 
shown.  Figure  8  illustrates  this  method  for  updating  the  window 
function. 

This  technique  greatly  reduces  the  necessary  data  bandwidth  for 
calculating  the  mean  and  the  sum  of  squares  for  the  processing  window; 
these  can  then  be  combined  to  form  the  variance.  The  structure 
shown  includes  shift  registers  for  pixel  storage,  column  storage  for  the 
mean  calculation,  the  column  storage  for  the  sum  of  squares  calculation. 
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Figure  7.  5x5  variance  calculation. 
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The  only  arithmetic  logic  required  for  this  structure  is  four  3-input 
adders  and  three  multipliers. 

This  is  a  function  that  could  be  implemented  using  a  residue  tech¬ 
nique,  since  only  arithmetic  functions  are  being  performed.  Since 
the  preceding  function  is  already  being  performed  using  residue  arith¬ 
metic,  the  conversion  back  to  binary  could  be  done  after  the  moment 
calculation.  The  structure  would  look  identical,  but  the  blocks  them¬ 
selves  would  each  be  smaller.  For  example,  with  a  smaller  wordsize  the 
memory  would  be  reduced.  Also,  the  multiplier  box  could  be  replaced 
with  look-up  tables,  thus  providing  a  savings  in  hardware.  To  deter¬ 
mine  the  feasibility,  a  statistical  dynamic  range  analysis  will  be  per¬ 
formed  to  see  how  many  bases  would  be  required  and  thus  if  the  residue 
technique  could  provide  normalization. 

C.  Normalization 

The  next  major  function  to  be  performed  is  a  division  or  nomali- 
zation  function.  The  output  of  the  convolutions  are  divided  by  the  out¬ 
put  of  the  small  window  moment  calculation,  on  a  pixel  by  pixel  basis. 
This  requires  that  there  be  some  memory  to  delay  the  output  of  the  con¬ 
volutions  while  Che  moment  is  being  calculated.  The  memory  required 
would  be  approximately  5  line  lengths  x  8  bits  x  N  channels. 

A  pipelined  system  for  performing  an  integer  divide  is  shown  in 
Figure  9.  This  performs  a  division  between  two  binary  numbers  using  the 
direct  method:  an  8  bit  divisor  would  require  8  stages.  If  more 
precision  were  required,  more  stages  could  be  used,  and  these  would  cal¬ 
culate  the  fractional  part  of  the  answer.  The  direct  method  was  chosen 
over  the  iterative  method  because  of  the  synchronous  nature  of  this 
system. 


D.  Large  Window  Statistical  Calculation 

The  structure  for  performing  the  large  window  statistical  calcu¬ 
lation  is  the  same  as  for  the  small  window  calculation.  The  function 
suggested  by  Laws  is  the  sum  of  the  absolute  values  of  the  pixels. 


181 


LATCH  c  _  A/B  LATCH 


apjAyp 


Figure  10  shows  a  structure  for  performing  this  function.  As 
suggested  in  the  discussion  of  the  small  window  moment  calculation, 
this  structure  could  be  implemented  in  residue.  In  fact,  if  the  normal¬ 
ization  step  could  be  skipped,  then  the  whole  system  could  be  accom¬ 
plished  in  residue.  There  are  two  problems,  however: 

•  An  absolute  value  cannot  be  accomplished  in  residue 
and  thus  the  next  best  thing  would  be  to  use  the 
sum  of  squares. 

•  A  preliminary  dynamic  range  analysis  using  the  sum 
of  squares  over  a  15  x  15  window  indicates  that  the 
system  would  need  to  support  a  dynamic  range  in 
excess  of  36  bits.  This  would  definitely  be  pro¬ 
hibitive  in  that  a  large  number  of  bases  would  be 
required  and  Che  bases  themselves  would  require  over 
5  bits  to  encode. 

It  may  be  that  a  statistical  dynamic  range  analysis  will  show  that  it  is 
possible  to  achieve  a  low  probability  of  overflow  with  a  more  reason¬ 
able  dynamic  range  (e.g.,  <30  bits). 

E.  Linear  Transformation 

This  function  will  generate  a  vector  with  M  components  by  form¬ 
ing  linear  combinations  of  the  N  components  of  the  input  vector.  Since 
the  structure  shown  in  Figure  11  can  be  used  to  generate  a  single  com¬ 
ponent  of  Che  output  vector,  the  en  'ire  output  vector  can  be  generated 
by  replicating  this  structure  M  times.  This  structure  is  basically 
Che  same  as  that  used  for  the  convolution.  It  is  composed  of  N 
registers,  N  multipliers,  and  N-l  adders.  As  with  the  convolution  the 
value  of  the  linear  weights  will  be  programmable,  with  the  actual  mech¬ 
anism  for  programming  depending  on  the  structure  of  the  multiplier 
(either  memory  look-up  or  logic).  The  only  difference  would  be  in  the 
format  of  Che  data  input.  For  the  convolution,  the  data  are  shifted 
through  the  registers;  but  for  the  linear  transformation,  the  registers 
are  all  loaded  in  parallel. 


Figure  11.  Linear  transformation  structure. 


F. 


Discriminanc  Function  Evaluation 


No  hardware  was  designed  tor  this  processing  step  since  the 
form  of  the  function  is  unknown.  If  the  function  is  linear,  then  the 
structure  used  in  the  previous  step  could  also  be  used  for  this  one. 

Any  other  form,  such  as  a  higher-order  polynomial,  would  require  addi- 
tonal  multipliers  and  adders. 

G.  Gate  Count  Tabulation 

Table  1  summarizes  the  results  of  the  texture  system  design, 
presenting  the  number  of  gates  required  for  each  function.  These  data 
will  be  used  later  in  the  study  for  the  VLSI  partitioning. 

V.  LINE  FINDER  SYSTEM 

This  section  describes  the  logic  design  of  the  line  finder  sys¬ 
tem.  Figure  12  shows  the  data  flow  graph  for  the  system  and  identifies 
four  major  functions: 

•  Edge  detection 

•  Edge  thinning 

•  Edge  linking 

•  Edge  tracing. 

A  design  is  presented  for  each  of  the  four  functions,  and  an  estimated 
gate  count  is  given. 

A.  Edge  Strength  Computation  and  Detection 

Edge  strength  computation  is  performed  by  convolving  5x5  masks 
in  six  directions  with  the  image  of  interest  (Figure  13).  The  mask 
directions  are  set  at  30°  intervals,  and  the  weights  are  shown  in 
Figure  14.  Following  edge  strength  computation,  the  magnitudes  from  the 
six  directions  are  compared,  and  the  direction  with  greatest  magnitude 
is  selected  as  the  edge  vector  for  the  pixel  location. 

There  is  some  question  as  to  what  the  optimum  mask  size,  weights, 
and  number  of  mask  directions  should  be.  For  example,  a  larger  mask  size 
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Logit:  diagram  for  edge  detection. 
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Figure  14.  Edge  masks  M^(i,j)  in  six  directions. 
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(say  5x5)  is  more  immune  to  noise,  but  takes  considerably  more  hardware 
to  implement  (than  say  a  3  x  3  mask).  The  same  can  be  said  for  implement¬ 
ing  the  masks  in  six  directions  as  opposed  to  four.  It  would  be  desir¬ 
able  from  a  hardware  point  of  view  to  use  a  small  mask  size  and  as  few 
mask  directions  as  possible  without  sacrificing  too  much  on  performance. 
RADIUS  can  be  used  to  perform  the  six  convolutions.  Following  these 
six  convolutions,  direction  tags  will  be  added  to  the  edge  magnitudes 
(see  Figure  13).  Each  data  word  consists  of  12  bits  at  this  stage,  8  bits 
for  magnitude  and  4  bits  for  direction.  The  magnitude  part  of  each 
direction  is  then  compared  with  that  of  the  other  directions,  until  the 
direction  with  greatest  magnitude  is  found.  Five  comparators  are 
needed  to  perform  the  magnitude  comparisons. 

Figure  15  shows  the  logic  for  performing  the  magnitude  comparison. 
The  corresponding  bits  of  each  data  word  are  compared  in  turn,  and 
greater  than  (G)  and  less  than  (L)  signals  propagate  towards  the  msb 
bits.  G{9}  and  L{9}  indicate  whether  A  is  greater  than  or  less  than  B. 

If  G{9}  and  L{9}  are  both  low,  then  the  two  words  are  the  same.  Sixty 
four  gates  would  be  necessary  to  implement  a  full  8  bit  comparator. 

Hence  320  gates  would  be  necessary  to  implement  the  five  comparators  in 
the  edge  detection  step.  It  has  been  estimated  that  it  takes  178  K  gates 
to  implement  the  six  residue  convolvers.  This  is  by  var  the  most  hard¬ 
ware  intensive  computational  part. 


B.  Thinning  and  Thresholding 

Thinning  in  Nevatia's  process  is  accomplished  by  comparing  the 
edge  magnitude  and  direction  at  a  pixel  location  with  that  of  some  of 
its  surrounding  eight  neighbors.  To  qualify  as  an  edge  point,  the  fol¬ 
lowing  rules  must  be  observed: 

(1)  The  edge  magnitude  at  the  pixel  location  must  be 
greater  than  that  of  its  two  neighbors  in  a  direc¬ 
tion  normal  to  the  direction  of  this  edge.  The 
normal  to  a  30°  edge  is  approximated  by  the  diag¬ 
onals  of  a  3  x  3  grid. 
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TOTAL  NUMBER  OF  GATES  TO  IMPLEMENT  8  BIT  COMPARATOR  -  8  X  (4  +  4)  =»  64  GATES 
Figure  15.  Logic  for  performing  magnitude  comparison. 
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(2)  The  edge  directions  of  the  two  neighboring  pixels, 
as  defined  in  (1),  must  be  withirr  1  unit  difference 
(30°)  from  that  of  the  central  pixel. 

(3)  The  edge  magnitude  of  the  central  pixel  must  exceed 
a  certain  fixed  threshold.  This  threshold  is  arbi- 
traritly  set  at  some  low  value. 

When  implementing  in  hardware,  the  algorithm  can  be  divided  into  two 
parts:  The  first  accesses  the  edge  magnitude  and  direction  of  the  two 
neighbors  normal  to  the  direction  of  the  edge.  The  second  compares  the 
magnitude  and  direction  of  the  central  pixel  with  those  of  the  two  neigh¬ 
bors  and  a  fixed  threshold  to  ensure  that  conditions  (2)  and  (3)  above 
are  met.  Figure  16  shows  the  logic  for  performing  thinning  and  threshold¬ 
ing.  The  eight  neighbors  of  the  central  pixel  are  first  sent  to  a 
switching  network,  which  decodes  the  edge  direction  of  the  central  pixel 
and  allows  the  edge  data  for  the  correct  two  neighbors  to  pass  on  to  the 
comparison  network.  In  the  second  part,  the  edge  magnitudes  and  direc¬ 
tions  are  compared,  and,  if  conditions  (2)  and  (3)  are  met,  an  edge  point 
is  presumed  present.  A  gate  count  reveals  that  190  gates  are  needed: 

8  for  the  switching  network  and  181  for  the  comparison  network. 

C.  Edge  Linking 

At  this  point,  the  data  format  for  each  pizel  is  pictured  in 
Figure  17.  One  bit  is  assigned  to  indicate  if  the  pixel  is  an  edge 
point,  3  bits  to  indicate  the  direction  of  the  edge,  and  8  bits  to 
indicate  edge  strength.  The  next  step  is  to  link  the  edge  points 
together  by  forming  predecessor  and  successor  members.  There  are  at 
most  three  possible  candidates  to  be  a  predecessor  or  successor  among 
the  eight  neighbors  of  an  edge  point;  however,  an  edge  point  can  have 
at  most  two  predecessors  and  two  successors.  In  such  a  case,  only  the 
primary  predecessor  (or  successor)  is  encoded,  and  a  special  bit  is 
marked  to  indicate  the  presence  of  a  fork.  The  following  rules  are 
observed  in  edge  linking: 
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Figure  16.  Logic  for  performing  thinning  and  thresholding 
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Figure  17.  Data  format  for  edge  point  after  thinning  and 
thresholding. 
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(a)  FORK.  2  SUCCESSORS  THAT  (bl  FORK.  2  SUCCESSORS  THAT  ARE 

ARE  NOT  4-NEIGHBORS  4  NEIGHBORS,  BUT  WHOSE  ORIENTATION 

DIFFER  BY  2  UNITS  (60  DEGREES) 
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(c)  NO  FORK.  2  POSSIBLE  SUCCESSORS  <d)  FORK,  3  POSSIBLE  SUCCESSORS 
WHOSE  ORIENTATIONS  ARE  THE  SAME. 


Figure  18.  Some  successor  configurations  illustrating  edge  linking 
rules. 
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(1)  The  orientation  of  a  predecessor  (successor)  must  be 
less  chan  1  unit  difference  (30°)  from  that  of  the 
central  edge  point. 

(2)  If  there  is>  more  than  1  possible  predecessor 
(successor),  a  fork  will  exist  if  they  are  not 
4-neighbors  or  if  their  orientations  differ  by  2 
units  (60°)  if  they  are  4-neighbors.  In  such  a 
case,  the  predecessor  (successor)  with  the  greater 
magnitude  is  encoded  as  the  primary  predecessor 
(successor).  If  the  possible  candidates  are  4- 
neighbors  with  the  same  orientation,  the  chosen  can¬ 
didate  is  the  nearer  of  the  two  in  the  Euclidean 
sense. 

The  interested  reader  is  referred  to  Refs.  1  and  2  for  more  details  on 
Che  rules  concerning  edge  linking.  Figure  18  shows  some  successor  con¬ 
figurations  illustrating  the  above  rules. 

The  logic  for  edge  linking  is  shown  in  Figure  19.  In  the  first 
section,  a  shifting  network  decodes  the  edge  orientation  of  the  central 
pixel  and  accesses  the  edge  data  for  the  three  possible  successor 
(predecessor)  locations.  This  edge  data  is  then  sent  to  a  comparison 
network  that  tests  whether  conditions  (1)  and  (2)  above  have  been  satis¬ 
fied.  The  results  are  then  sent  to  a  PLA  containing  about  40  minterms, 
Che  output  of  which  indicates  the  successor  and  whether  a  fork  is 
present.  The  coding  for  the  PLA  is  shown  in  Figure  20.  The  gate  count 
is  6  for  the  shifting  network,  72  for  the  comparison  network,  and  170 
for  the  PLA,  for  a  total  of  about  250  gates.  Formation  of  the  predecessor 
numbers  would  cake  another  similar  sized  network,  bringing  the  total 
number  of  gates  to  about  500. 


D.  Edge  Tracing 

In  this  step,  the  predecessor/ successor  (PS)  elements  formed  in 
the  previous  operation  are  linked  together.  The  data  format  for  the  PS 
file  is  shown  in  Figure  21.  Included  in  this  format  is  a  trace  bit 
which  indicates  whether  the  point  has  already  been  collected  The  PS 
elements  are  linked  in  three  passes: 
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Figure  20.  Coding  for  edge  linking  PLA. 
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(1)  In  the  first  pass,  a  raster  scan  is  made  of  the  PS 
files. in  search  of  an  edge  point  with  no  predecessor. 

Edge  tracing  begins  at  these  points,  and  only  the 
primary  successor  elements  are  linked  when  there  is 

a  fork. 

(2)  In  the  second  pass,  the  edge  points  that  have  forks 
are  revisited,  and  the  secondary  successor  elements 
are  linked. 

(3)  In  the  third  pass,  circular  edge  sements  with  no 
starting  or  fork  points  are  linked.  This  requires 
scanning  the  trace  file  to  find  an  edge  point  that 
has  not  been  collected  before  and  then  traceing  out 
the  circular  segment. 

It  would  be  undesriable  to  perform  edge  tracing  in  three  passes  if  the 
operation  is  to  be  done  in  real  time,  since  the  requirements  on  buffer¬ 
ing  and  speed  of  the  hardware  would  then  be  quite  severe.  The  three 
passes  for  the  operation  could  be  reduced  if  all  the  start  and  fork 
points  were  identified  in  a  previous  step,  and  the  addresses  stored  in 
a  list.  The  start  address  for  tracing  an  edge  segment  could  then  be 
provided  by  this  list  instead  of  by  scanning  the  PS  file.  This  could 
result  in  considerable  time  saving,  since  typically  less  than  10%  of 
an  image  are  edge  points  and  only  a  fraction  of  those  are  start  and 
fork  points. 

The  logic  for  edge  tracing  is  shown  in  Figure  22.  Two  5.5 
megabit  memories  are  needed  for  storing  the  PS  files.  While  edge  tracing 
is  being  performed  on  the  edge  data  in  one  memory,  the  other  memory  is 
being  filled  with  fresh  data.  The  "start  and  fork"  list  provides  the 
starting  address  of  an  edge  segment.  The  PS  information  fetched  for  this 
point  is  used  to  form  the  edge  linked  list,  and  also  to  generate  the 
address  of  the  successor  to  be  fetched.  The  address  modification  unit 
acts  as  a  controller  enforcing  the  three  rules  above  and  may  broadcast  the 
address  of  a  new  pixel  to  be  fetched,  modify  the  address  of  the  previous 
pixel  fetched,  or  decide  that  a  fork  is  present  and  broadcast  the 
address  of  the  secondary  successor  that  is  to  be  fetched. 


START 


Figure  22.  Logic  for  edge  tracing. 
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Table  2  summarizes  the  results  of  the  line  finder  system,  presenting 
the  number  of  gates  required  for  each  function.  As  with  the  texture 
system,  these  data  will  be  used  later  in  the  study  for  VLSI  partitioning. 

V.  SYSTEM  INTEGRATION  TO  HOST  PROCESSOR 

A  principal  area  of  our  present  work  is  to  analyze  and  develop 
hardware  that  will  be  integrated  into  a  commercially  available  test  sys¬ 
tem  to  make  it  portable  and  extendable.  One  factor  expected  to  affect 
the  partitioning  substantially  is  the  input/output  requirements  of  the 
system.  The  data  flow  graphs  specify  the  bus  widths  and  the  basic  data 
inputs  and  outputs,  but  absolute  bandwidths  and  control  requirements 
cannot  be  specified  until  an  operating  environment  has  been  defined.  In 
Che  past,  when  we  have  fabricated  CCD  LSI  image  processing  chips,  they 
have  been  used  in  a  stand  alone  system  with  dedicated  input  and  output 
devices.  The  bandwidths  of  these  devices  specified  tne  bandwidth  of  the 
data  paths  on  the  chip.  But  we  do  not  expect  some  of  the  systems  being 
examined  in  this  study  to  be  used  in  a  stand  alone  system;  instead  we 
expect  them  to  be  used  as  a  peripheral  device  to  a  host  general-purpose 
processor.  This  requires  then  that  we  expand  our  system  study  to 
include  the  implications  of  using  a  general  purpose  computer  to  host 
the  IU  systems. 

We  are  currently  acquiring  a  PDP  11/34  computer  system  to  provide 
us  with  a  means  of  collecting  our  own  experimental  data  on  the  inter¬ 
face  problems.  Our  immediate  plans  involve  integrating  the  residue  local 
area  processor  on  a  card  which  connects  directly  to  the  DEC  UNIBUS.  We 
plan  to  use  direct  memory  access  to  transfer  the  data  to  and  from  the 
processor,  since  this  will  free  the  host  processor  for  other  duties. 

This  experimental  work  will  be  complemented  by  a  study  of  other  systems 
that  use  special  purpose  image  processing  devices  as  peripheral  devices. 


VI. 


FUTURE  WORK 


The  work  described  above  will  continue  over  the  next  year  with 
particular  emphasis  on  the  following  areas: 

•  Development  of  VLSI  RADIUS  processor 

•  Design  and  partioning  of  IU  systems 

•  Design  of  segmenter  system 

•  Commonality  studies 

•  Simulations  of  designs 

•  Integration  of  system  to  host  PDP  11-34. 
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Table  1*  Gate  Count  for  Texture  Classification  System 


5  line  kernel  generation 

15  K  gates 

5x5  convolution 

27K  gates/channel 

5x5  variance 

10K  gates 

Normalization 

IK/ channel 

Large  window  statistical  calculation 

8. 2 5 K/ channel 

Transform  (M  input  channels) 

2.1K*M/output  channel 

[i] 


Table  2.  Gace  Count  for  Line  Finder  System 


Edge  detection 

178K 

Thinning 

190  gates 

Edge  linking 

500 

Edge  tracing 

12  Mbit  memory  +  5K  logic  gates 
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